Ensemble#

class sklearn_ensemble_cv.ensemble.Ensemble(**kwargs)#

Ensemble class is built on top of sklearn.ensemble.BaggingRegressor. It provides additional methods for computing ECV estimates.

Attributes:
estimators_samples_

The subset of drawn samples for each base estimator.

Methods

compute_cgcv_estimate(X_train, Y_train[, M, ...])

Computes the corrected GCV estimate for the given input data using the provided BaggingRegressor model.

compute_ecv_estimate(X_train, Y_train[, ...])

Computes the ECV estimate for the given input data using the provided BaggingRegressor model.

compute_gcv_estimate(X_train, Y_train[, M, ...])

Computes the naive GCV estimate for the given input data using the provided BaggingRegressor model.

compute_risk(X, Y[, M_test, return_df, avg, ...])

Computes the risk estimate for the given input data using the provided BaggingRegressor model.

extrapolate(risk[, M_test])

Extrapolates the risk estimate for the given ensemble size using the provided BaggingRegressor model.

fit(X, y[, sample_weight])

Build a Bagging ensemble of estimators from the training set (X, y).

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

predict(X, **params)

Predict regression target for X.

predict_individual(X[, M, n_jobs, verbose])

Predicts the target values for the given input data using the provided BaggingRegressor model.

score(X, y[, sample_weight])

Return coefficient of determination on test data.

set_fit_request(*[, sample_weight])

Configure whether metadata should be requested to be passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

set_score_request(*[, sample_weight])

Configure whether metadata should be requested to be passed to the score method.

compute_cgcv_estimate(X_train, Y_train, M=None, type='full', return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the corrected GCV estimate for the given input data using the provided BaggingRegressor model.

Parameters:
X_trainnp.ndarray

[n, p] The input covariates.

Y_trainnp.ndarray

[n, ] The target values of the input data.

typestr, optional

The type of CGCV estimate to compute. Can be either ‘full’ (using full observations) or ‘ovlp’ (using overlapping observations).

return_dfbool, optional

If True, returns the GCV estimate as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

kwargs_estdict

Additional keyword arguments for the risk estimate.

Returns:
risk_gcvnp.ndarray or pandas.DataFrame

[M_test, ] The CGCV estimate for each ensemble size in M_test.

compute_ecv_estimate(X_train, Y_train, M_test=None, M0=None, return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the ECV estimate for the given input data using the provided BaggingRegressor model.

Parameters:
X_trainnp.ndarray

[n, p] The input covariates.

Y_trainnp.ndarray

[n, …] The target values of the input data.

M_testint or np.ndarray

The maximum ensemble size of the ECV estimate.

M0int, optional

The number of estimators to use for the OOB estimate. If None, M0 is set to the number of estimators in the BaggingRegressor model.

return_dfbool, optional

If True, returns the ECV estimate as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

kwargs_estdict

Additional keyword arguments for the risk estimate.

Returns:
risk_ecvnp.ndarray or pandas.DataFrame

[M_test, ] The ECV estimate for each ensemble size in M_test.

compute_gcv_estimate(X_train, Y_train, M=None, type='full', return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the naive GCV estimate for the given input data using the provided BaggingRegressor model.

Parameters:
X_trainnp.ndarray

[n, p] The input covariates.

Y_trainnp.ndarray

[n, ] The target values of the input data.

typestr, optional

The type of GCV estimate to compute. Can be either ‘full’ (the naive GCV using full observations) or ‘union’ (the naive GCV using training observations).

return_dfbool, optional

If True, returns the GCV estimate as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

kwargs_estdict

Additional keyword arguments for the risk estimate.

Returns:
risk_gcvnp.ndarray or pandas.DataFrame

[M_test, ] The GCV estimate for each ensemble size in M_test.

compute_risk(X, Y, M_test=None, return_df=False, avg=True, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the risk estimate for the given input data using the provided BaggingRegressor model.

Parameters:
Xnp.ndarray

[n, p] The input covariates.

Ynp.ndarray

[n, …] The target values of the input data.

M_testint, optional

The ensemble size of the risk estimate.

return_dfbool, optional

If True, returns the risk estimate as a pandas.DataFrame object.

Returns:
risknp.ndarray or pandas.DataFrame

[M_test, ] The risk estimate for each ensemble size in M_test.

extrapolate(risk, M_test=None)#

Extrapolates the risk estimate for the given ensemble size using the provided BaggingRegressor model.

Parameters:
risknp.ndarray

[M0, ] The risk estimate for the ensemble sizes in M0.

M_testint or np.ndarray

The ensemble size to extrapolate the risk estimate to.

Returns:
risk_ecvnp.ndarray

[M_test, ] The extrapolated risk estimate for each ensemble size in M_test.

predict_individual(X: ndarray, M: int = -1, n_jobs: int = -1, verbose: bool = 0) ndarray#

Predicts the target values for the given input data using the provided BaggingRegressor model.

Parameters:
regrBaggingRegressor

The BaggingRegressor model to use for prediction.

Xnp.ndarray

[n, p] The input data to predict target values for.

Returns:
Y_hatnp.ndarray

[n, M] The predicted target values of all $M$ estimators for the input data.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') Ensemble#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') Ensemble#

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.