Ensemble#

class sklearn_ensemble_cv.ensemble.Ensemble(**kwargs)#

Ensemble class is built on top of sklearn.ensemble.BaggingRegressor. It provides additional methods for computing ECV estimates.

Attributes:

estimators_samples_: The subset of drawn samples for each base estimator.

Methods

`compute_cgcv_estimate`(X_train, Y_train[, M, ...])	Computes the corrected GCV estimate for the given input data using the provided BaggingRegressor model.
`compute_ecv_estimate`(X_train, Y_train[, ...])	Computes the ECV estimate for the given input data using the provided BaggingRegressor model.
`compute_gcv_estimate`(X_train, Y_train[, M, ...])	Computes the naive GCV estimate for the given input data using the provided BaggingRegressor model.
`compute_risk`(X, Y[, M_test, return_df, avg, ...])	Computes the risk estimate for the given input data using the provided BaggingRegressor model.
`extrapolate`(risk[, M_test])	Extrapolates the risk estimate for the given ensemble size using the provided BaggingRegressor model.
`fit`(X, y[, sample_weight])	Build a Bagging ensemble of estimators from the training set (X, y).
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X, **params)	Predict regression target for X.
`predict_individual`(X[, M, n_jobs, verbose])	Predicts the target values for the given input data using the provided BaggingRegressor model.
`score`(X, y[, sample_weight])	Return coefficient of determination on test data.
`set_fit_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `fit` method.
`set_params`(**params)	Set the parameters of this estimator.
`set_score_request`(*[, sample_weight])	Configure whether metadata should be requested to be passed to the `score` method.

compute_cgcv_estimate(X_train, Y_train, M=None, type='full', return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the corrected GCV estimate for the given input data using the provided BaggingRegressor model.

Parameters:

X_trainnp.ndarray: [n, p] The input covariates.
Y_trainnp.ndarray: [n, ] The target values of the input data.
typestr, optional: The type of CGCV estimate to compute. Can be either ‘full’ (using full observations) or ‘ovlp’ (using overlapping observations).
return_dfbool, optional: If True, returns the GCV estimate as a pandas.DataFrame object.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
kwargs_estdict: Additional keyword arguments for the risk estimate.

Returns:

risk_gcvnp.ndarray or pandas.DataFrame: [M_test, ] The CGCV estimate for each ensemble size in M_test.

compute_ecv_estimate(X_train, Y_train, M_test=None, M0=None, return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the ECV estimate for the given input data using the provided BaggingRegressor model.

Parameters:

X_trainnp.ndarray: [n, p] The input covariates.
Y_trainnp.ndarray: [n, …] The target values of the input data.
M_testint or np.ndarray: The maximum ensemble size of the ECV estimate.
M0int, optional: The number of estimators to use for the OOB estimate. If None, M0 is set to the number of estimators in the BaggingRegressor model.
return_dfbool, optional: If True, returns the ECV estimate as a pandas.DataFrame object.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
kwargs_estdict: Additional keyword arguments for the risk estimate.

Returns:

risk_ecvnp.ndarray or pandas.DataFrame: [M_test, ] The ECV estimate for each ensemble size in M_test.

compute_gcv_estimate(X_train, Y_train, M=None, type='full', return_df=False, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the naive GCV estimate for the given input data using the provided BaggingRegressor model.

Parameters:

X_trainnp.ndarray: [n, p] The input covariates.
Y_trainnp.ndarray: [n, ] The target values of the input data.
typestr, optional: The type of GCV estimate to compute. Can be either ‘full’ (the naive GCV using full observations) or ‘union’ (the naive GCV using training observations).
return_dfbool, optional: If True, returns the GCV estimate as a pandas.DataFrame object.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
kwargs_estdict: Additional keyword arguments for the risk estimate.

Returns:

risk_gcvnp.ndarray or pandas.DataFrame: [M_test, ] The GCV estimate for each ensemble size in M_test.

compute_risk(X, Y, M_test=None, return_df=False, avg=True, n_jobs=-1, verbose=0, **kwargs_est)#

Computes the risk estimate for the given input data using the provided BaggingRegressor model.

Parameters:

Xnp.ndarray: [n, p] The input covariates.
Ynp.ndarray: [n, …] The target values of the input data.
M_testint, optional: The ensemble size of the risk estimate.
return_dfbool, optional: If True, returns the risk estimate as a pandas.DataFrame object.

Returns:

risknp.ndarray or pandas.DataFrame: [M_test, ] The risk estimate for each ensemble size in M_test.

extrapolate(risk, M_test=None)#

Extrapolates the risk estimate for the given ensemble size using the provided BaggingRegressor model.

Parameters:

risknp.ndarray: [M0, ] The risk estimate for the ensemble sizes in M0.
M_testint or np.ndarray: The ensemble size to extrapolate the risk estimate to.

Returns:

risk_ecvnp.ndarray: [M_test, ] The extrapolated risk estimate for each ensemble size in M_test.

predict_individual(X: ndarray, M: int = -1, n_jobs: int = -1, verbose: bool = 0) → ndarray#

Predicts the target values for the given input data using the provided BaggingRegressor model.

Parameters:

regrBaggingRegressor: The BaggingRegressor model to use for prediction.
Xnp.ndarray: [n, p] The input data to predict target values for.

Returns:

Y_hatnp.ndarray: [n, M] The predicted target values of all $M$ estimators for the input data.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → Ensemble#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit.

Returns:

selfobject: The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → Ensemble#

Configure whether metadata should be requested to be passed to the score method.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in score.

Returns:

selfobject: The updated object.

Ensemble

Contents

Ensemble#