Doubly-robust semiparametric inference#

sklearn_ensemble_cv.cross_validation.ECV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, delta=0.0, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#

Cross-validation for ensemble models using the empirical ECV estimate.

Parameters:
X_train, Y_trainnumpy.array

The training samples.

gridpandas.DataFrame

The grid of hyperparameters to search over.

regrobject

The base estimator to use for the ensemble model.

kwargs_regrdict, optional

Additional keyword arguments for the base estimator.

kwargs_ensembledict, optional

Additional keyword arguments for the ensemble model.

Mint, optional

The ensemble size to build.

M0int, optional

The number of estimators to use for the ECV estimate.

M_maxint, optional

The maximum ensemble size to consider for the tuned ensemble.

deltafloat, optional

The suboptimality parameter for the ensemble size tuning by ECV.

return_dfbool, optional

If True, returns the results as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

X_test, Y_testnumpy.array, optional

The validation samples. It may be useful to be used for comparing the performance of ECV with other cross-validation methods that requires sample-splitting.

kwargs_estdict, optional

Additional keyword arguments for the risk estimate.

sklearn_ensemble_cv.cross_validation.GCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, corrected=True, type='full', return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#

Cross-validation for ensemble models using the empirical ECV estimate. Currently, only the GCV estimates for the Ridge, Lasso, and ElasticNet are implemented.

Parameters:
X_train, Y_trainnumpy.array

The training samples.

gridpandas.DataFrame

The grid of hyperparameters to search over.

regrobject

The base estimator to use for the ensemble model.

kwargs_regrdict, optional

Additional keyword arguments for the base estimator.

kwargs_ensembledict, optional

Additional keyword arguments for the ensemble model.

Mint, optional

The ensemble size to build.

correctedbool, optional

If True, compute the corrected GCV estimate.

typestr, optional

The type of GCV or GCV estimate to compute. It can be either ‘full’ or ‘union’ for naive GCV, and ‘full’ or ‘ovlp’ for CGCV.

return_dfbool, optional

If True, returns the results as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

X_test, Y_testnumpy.array, optional

The validation samples. It may be useful to be used for comparing the performance of ECV with other cross-validation methods that requires sample-splitting.

kwargs_estdict, optional

Additional keyword arguments for the risk estimate.

sklearn_ensemble_cv.cross_validation.KFoldCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#

Sample-split cross-validation for ensemble models.

Parameters:
X_train, Y_trainnumpy.array

The training samples.

regrobject

The base estimator to use for the ensemble model.

grid_regrpandas.DataFrame

The grid of hyperparameters for the base estimator.

grid_ensemblepandas.DataFrame

The grid of hyperparameters for the ensemble model.

kwargs_regrdict, optional

Additional keyword arguments for the base estimator.

kwargs_ensembledict, optional

Additional keyword arguments for the ensemble model.

Mint, optional

The ensemble size to build.

return_dfbool, optional

If True, returns the results as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

X_test, Y_testnumpy.array, optional

The test samples. It may be useful to be used for comparing the performance of different cross-validation methods.

kwargs_estdict, optional

Additional keyword arguments for the risk estimate.

kwargsdict, optional

Additional keyword arguments for KFold; see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html for more details.

sklearn_ensemble_cv.cross_validation.comp_empirical_ecv(X_train, Y_train, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#

Compute the empirical ECV estimate for a given ensemble model.

Parameters:
X_train, Y_trainnumpy.array

The training samples.

regrobject

The base estimator to use for the ensemble model.

kwargs_regrdict, optional

Additional keyword arguments for the base estimator.

kwargs_ensembledict, optional

Additional keyword arguments for the ensemble model.

Mint, optional

The maximum ensemble size to consider.

M0int, optional

The number of estimators to use for the ECV estimate.

M_maxint, optional

The maximum ensemble size to consider for the tuned ensemble.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

X_test, Y_testnumpy.array, optional

The test samples.

_check_inputbool, optional

If True, check the input arguments.

kwargs_estdict, optional

Additional keyword arguments for the risk estimate.

Returns:
risk_ecvnumpy.array

The empirical ECV estimate.

sklearn_ensemble_cv.cross_validation.comp_empirical_gcv(X_train, Y_train, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, corrected=True, type='full', n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#

Compute the empirical GCV or CGCV estimate for a given ensemble model.

Parameters:
X_train, Y_trainnumpy.array

The training samples.

regrobject

The base estimator to use for the ensemble model.

kwargs_regrdict, optional

Additional keyword arguments for the base estimator.

kwargs_ensembledict, optional

Additional keyword arguments for the ensemble model.

Mint, optional

The maximum ensemble size to consider.

correctedbool, optional

If True, compute the corrected GCV estimate.

typestr, optional

The type of GCV or GCV estimate to compute. It can be either ‘full’ or ‘union’ for naive GCV, and ‘full’ or ‘ovlp’ for CGCV.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

X_test, Y_testnumpy.array, optional

The test samples.

_check_inputbool, optional

If True, check the input arguments.

kwargs_estdict, optional

Additional keyword arguments for the risk estimate.

Returns:
risk_ecvnumpy.array

The empirical ECV estimate.

sklearn_ensemble_cv.cross_validation.comp_empirical_val(X_train, Y_train, X_val, Y_val, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#

Compute the empirical ECV estimate for a given ensemble model.

Parameters:
X_train, Y_trainnumpy.array

The training samples.

X_val, Y_valnumpy.array

The validation samples.

regrobject

The base estimator to use for the ensemble model.

kwargs_regrdict, optional

Additional keyword arguments for the base estimator.

kwargs_ensembledict, optional

Additional keyword arguments for the ensemble model.

Mint, optional

The maximum ensemble size to consider.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

X_test, Y_testnumpy.array, optional

The test samples.

_check_inputbool, optional

If True, check the input arguments.

kwargs_estdict, optional

Additional keyword arguments for the risk estimate.

Returns:
risk_ecvnumpy.array

The empirical ECV estimate.

sklearn_ensemble_cv.cross_validation.splitCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#

Sample-split cross-validation for ensemble models.

Parameters:
X_train, Y_trainnumpy.array

The training samples.

regrobject

The base estimator to use for the ensemble model.

grid_regrpandas.DataFrame

The grid of hyperparameters for the base estimator.

grid_ensemblepandas.DataFrame

The grid of hyperparameters for the ensemble model.

kwargs_regrdict, optional

Additional keyword arguments for the base estimator.

kwargs_ensembledict, optional

Additional keyword arguments for the ensemble model.

Mint, optional

The ensemble size to build.

return_dfbool, optional

If True, returns the results as a pandas.DataFrame object.

n_jobsint, optional

The number of jobs to run in parallel. If -1, all CPUs are used.

X_test, Y_testnumpy.array, optional

The test samples. It may be useful to be used for comparing the performance of different cross-validation methods.

kwargs_estdict, optional

Additional keyword arguments for the risk estimate.

kwargsdict, optional

Additional keyword arguments for ShuffleSplit; see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html for more details.