Doubly-robust semiparametric inference#

sklearn_ensemble_cv.cross_validation.ECV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, delta=0.0, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#

Cross-validation for ensemble models using the empirical ECV estimate.

Parameters:

X_train, Y_trainnumpy.array: The training samples.
gridpandas.DataFrame: The grid of hyperparameters to search over.
regrobject: The base estimator to use for the ensemble model.
kwargs_regrdict, optional: Additional keyword arguments for the base estimator.
kwargs_ensembledict, optional: Additional keyword arguments for the ensemble model.
Mint, optional: The ensemble size to build.
M0int, optional: The number of estimators to use for the ECV estimate.
M_maxint, optional: The maximum ensemble size to consider for the tuned ensemble.
deltafloat, optional: The suboptimality parameter for the ensemble size tuning by ECV.
return_dfbool, optional: If True, returns the results as a pandas.DataFrame object.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
X_test, Y_testnumpy.array, optional: The validation samples. It may be useful to be used for comparing the performance of ECV with other cross-validation methods that requires sample-splitting.
kwargs_estdict, optional: Additional keyword arguments for the risk estimate.

sklearn_ensemble_cv.cross_validation.GCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, corrected=True, type='full', return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#

Cross-validation for ensemble models using the empirical ECV estimate. Currently, only the GCV estimates for the Ridge, Lasso, and ElasticNet are implemented.

Parameters:

X_train, Y_trainnumpy.array: The training samples.
gridpandas.DataFrame: The grid of hyperparameters to search over.
regrobject: The base estimator to use for the ensemble model.
kwargs_regrdict, optional: Additional keyword arguments for the base estimator.
kwargs_ensembledict, optional: Additional keyword arguments for the ensemble model.
Mint, optional: The ensemble size to build.
correctedbool, optional: If True, compute the corrected GCV estimate.
typestr, optional: The type of GCV or GCV estimate to compute. It can be either ‘full’ or ‘union’ for naive GCV, and ‘full’ or ‘ovlp’ for CGCV.
return_dfbool, optional: If True, returns the results as a pandas.DataFrame object.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
X_test, Y_testnumpy.array, optional: The validation samples. It may be useful to be used for comparing the performance of ECV with other cross-validation methods that requires sample-splitting.
kwargs_estdict, optional: Additional keyword arguments for the risk estimate.

sklearn_ensemble_cv.cross_validation.KFoldCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#

Sample-split cross-validation for ensemble models.

Parameters:

X_train, Y_trainnumpy.array: The training samples.
regrobject: The base estimator to use for the ensemble model.
grid_regrpandas.DataFrame: The grid of hyperparameters for the base estimator.
grid_ensemblepandas.DataFrame: The grid of hyperparameters for the ensemble model.
kwargs_regrdict, optional: Additional keyword arguments for the base estimator.
kwargs_ensembledict, optional: Additional keyword arguments for the ensemble model.
Mint, optional: The ensemble size to build.
return_dfbool, optional: If True, returns the results as a pandas.DataFrame object.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
X_test, Y_testnumpy.array, optional: The test samples. It may be useful to be used for comparing the performance of different cross-validation methods.
kwargs_estdict, optional: Additional keyword arguments for the risk estimate.
kwargsdict, optional: Additional keyword arguments for KFold; see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html for more details.

sklearn_ensemble_cv.cross_validation.comp_empirical_ecv(X_train, Y_train, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#

Compute the empirical ECV estimate for a given ensemble model.

Parameters:

X_train, Y_trainnumpy.array: The training samples.
regrobject: The base estimator to use for the ensemble model.
kwargs_regrdict, optional: Additional keyword arguments for the base estimator.
kwargs_ensembledict, optional: Additional keyword arguments for the ensemble model.
Mint, optional: The maximum ensemble size to consider.
M0int, optional: The number of estimators to use for the ECV estimate.
M_maxint, optional: The maximum ensemble size to consider for the tuned ensemble.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
X_test, Y_testnumpy.array, optional: The test samples.
_check_inputbool, optional: If True, check the input arguments.
kwargs_estdict, optional: Additional keyword arguments for the risk estimate.

Returns:

risk_ecvnumpy.array: The empirical ECV estimate.

sklearn_ensemble_cv.cross_validation.comp_empirical_gcv(X_train, Y_train, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, corrected=True, type='full', n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#

Compute the empirical GCV or CGCV estimate for a given ensemble model.

Parameters:

X_train, Y_trainnumpy.array: The training samples.
regrobject: The base estimator to use for the ensemble model.
kwargs_regrdict, optional: Additional keyword arguments for the base estimator.
kwargs_ensembledict, optional: Additional keyword arguments for the ensemble model.
Mint, optional: The maximum ensemble size to consider.
correctedbool, optional: If True, compute the corrected GCV estimate.
typestr, optional: The type of GCV or GCV estimate to compute. It can be either ‘full’ or ‘union’ for naive GCV, and ‘full’ or ‘ovlp’ for CGCV.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
X_test, Y_testnumpy.array, optional: The test samples.
_check_inputbool, optional: If True, check the input arguments.
kwargs_estdict, optional: Additional keyword arguments for the risk estimate.

Returns:

risk_ecvnumpy.array: The empirical ECV estimate.

sklearn_ensemble_cv.cross_validation.comp_empirical_val(X_train, Y_train, X_val, Y_val, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#

Compute the empirical ECV estimate for a given ensemble model.

Parameters:

X_train, Y_trainnumpy.array: The training samples.
X_val, Y_valnumpy.array: The validation samples.
regrobject: The base estimator to use for the ensemble model.
kwargs_regrdict, optional: Additional keyword arguments for the base estimator.
kwargs_ensembledict, optional: Additional keyword arguments for the ensemble model.
Mint, optional: The maximum ensemble size to consider.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
X_test, Y_testnumpy.array, optional: The test samples.
_check_inputbool, optional: If True, check the input arguments.
kwargs_estdict, optional: Additional keyword arguments for the risk estimate.

Returns:

risk_ecvnumpy.array: The empirical ECV estimate.

sklearn_ensemble_cv.cross_validation.splitCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#

Sample-split cross-validation for ensemble models.

Parameters:

X_train, Y_trainnumpy.array: The training samples.
regrobject: The base estimator to use for the ensemble model.
grid_regrpandas.DataFrame: The grid of hyperparameters for the base estimator.
grid_ensemblepandas.DataFrame: The grid of hyperparameters for the ensemble model.
kwargs_regrdict, optional: Additional keyword arguments for the base estimator.
kwargs_ensembledict, optional: Additional keyword arguments for the ensemble model.
Mint, optional: The ensemble size to build.
return_dfbool, optional: If True, returns the results as a pandas.DataFrame object.
n_jobsint, optional: The number of jobs to run in parallel. If -1, all CPUs are used.
X_test, Y_testnumpy.array, optional: The test samples. It may be useful to be used for comparing the performance of different cross-validation methods.
kwargs_estdict, optional: Additional keyword arguments for the risk estimate.
kwargsdict, optional: Additional keyword arguments for ShuffleSplit; see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html for more details.

Doubly-robust semiparametric inference

Contents

Doubly-robust semiparametric inference#