Ensemble#

class sklearn_ensemble_cv.ensemble.Ensemble(**kwargs)#

Ensemble class is built on top of sklearn.ensemble.BaggingRegressor. It provides additional methods for computing ECV estimates.

Attributes:
estimators_samples_

The subset of drawn samples for each base estimator.

Methods

compute_cgcv_estimate(X_train, Y_train[, M, ...])

Computes the corrected GCV estimate for the given input data using the provided BaggingRegressor model.

compute_ecv_estimate(X_train, Y_train[, ...])

Computes the ECV estimate for the given input data using the provided BaggingRegressor model.

compute_gcv_estimate(X_train, Y_train[, M, ...])

Computes the naive GCV estimate for the given input data using the provided BaggingRegressor model.

fit(X, y, *[, sample_weight])

Build a Bagging ensemble of estimators from the training set (X, y).

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X)

Predict regression target for X.

predict_individual(X[, M, n_jobs, verbose])

Predicts the target values for the given input data using the provided BaggingRegressor model.

score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

set_fit_request(*[, sample_weight])

Request metadata passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Request metadata passed to the score method.

compute_risk

extrapolate

compute_cgcv_estimate(X_train, Y_train, M=None, type='full', return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the corrected GCV estimate for the given input data using the provided BaggingRegressor model.

Parameters:
X_trainnp.ndarray

[n, p] The input covariates.

Y_trainnp.ndarray

[n, ] The target values of the input data.

typestr, optional

The type of CGCV estimate to compute. Can be either ‘full’ (using full observations) or ‘ovlp’ (using overlapping observations).

return_dfbool, optional

If True, returns the GCV estimate as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

kwargs_estdict

Additional keyword arguments for the risk estimate.

Returns:
risk_gcvnp.ndarray or pandas.DataFrame

[M_test, ] The CGCV estimate for each ensemble size in M_test.

compute_ecv_estimate(X_train, Y_train, M_test=None, M0=None, return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the ECV estimate for the given input data using the provided BaggingRegressor model.

Parameters:
X_trainnp.ndarray

[n, p] The input covariates.

Y_trainnp.ndarray

[n, …] The target values of the input data.

M_testint or np.ndarray

The maximum ensemble size of the ECV estimate.

M0int, optional

The number of estimators to use for the OOB estimate. If None, M0 is set to the number of estimators in the BaggingRegressor model.

return_dfbool, optional

If True, returns the ECV estimate as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

kwargs_estdict

Additional keyword arguments for the risk estimate.

Returns:
risk_ecvnp.ndarray or pandas.DataFrame

[M_test, ] The ECV estimate for each ensemble size in M_test.

compute_gcv_estimate(X_train, Y_train, M=None, type='full', return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the naive GCV estimate for the given input data using the provided BaggingRegressor model.

Parameters:
X_trainnp.ndarray

[n, p] The input covariates.

Y_trainnp.ndarray

[n, ] The target values of the input data.

typestr, optional

The type of GCV estimate to compute. Can be either ‘full’ (the naive GCV using full observations) or ‘union’ (the naive GCV using training observations).

return_dfbool, optional

If True, returns the GCV estimate as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

kwargs_estdict

Additional keyword arguments for the risk estimate.

Returns:
risk_gcvnp.ndarray or pandas.DataFrame

[M_test, ] The GCV estimate for each ensemble size in M_test.

predict_individual(X: ndarray, M: int = -1, n_jobs: int = -1, verbose: bool = 0) ndarray#

Predicts the target values for the given input data using the provided BaggingRegressor model.

Parameters:
regrBaggingRegressor

The BaggingRegressor model to use for prediction.

Xnp.ndarray

[n, p] The input data to predict target values for.

Returns:
Y_hatnp.ndarray

[n, M] The predicted target values of all $M$ estimators for the input data.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') Ensemble#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') Ensemble#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.