Doubly-robust semiparametric inference#
- sklearn_ensemble_cv.cross_validation.ECV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, delta=0.0, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#
Cross-validation for ensemble models using the empirical ECV estimate.
- Parameters:
- X_train, Y_trainnumpy.array
The training samples.
- gridpandas.DataFrame
The grid of hyperparameters to search over.
- regrobject
The base estimator to use for the ensemble model.
- kwargs_regrdict, optional
Additional keyword arguments for the base estimator.
- kwargs_ensembledict, optional
Additional keyword arguments for the ensemble model.
- Mint, optional
The ensemble size to build.
- M0int, optional
The number of estimators to use for the ECV estimate.
- M_maxint, optional
The maximum ensemble size to consider for the tuned ensemble.
- deltafloat, optional
The suboptimality parameter for the ensemble size tuning by ECV.
- return_dfbool, optional
If True, returns the results as a pandas.DataFrame object.
- n_jobsint, optional
The number of jobs to run in parallel. If -1, all CPUs are used.
- X_test, Y_testnumpy.array, optional
The validation samples. It may be useful to be used for comparing the performance of ECV with other cross-validation methods that requires sample-splitting.
- kwargs_estdict, optional
Additional keyword arguments for the risk estimate.
- sklearn_ensemble_cv.cross_validation.GCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, corrected=True, type='full', return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#
Cross-validation for ensemble models using the empirical ECV estimate. Currently, only the GCV estimates for the Ridge, Lasso, and ElasticNet are implemented.
- Parameters:
- X_train, Y_trainnumpy.array
The training samples.
- gridpandas.DataFrame
The grid of hyperparameters to search over.
- regrobject
The base estimator to use for the ensemble model.
- kwargs_regrdict, optional
Additional keyword arguments for the base estimator.
- kwargs_ensembledict, optional
Additional keyword arguments for the ensemble model.
- Mint, optional
The ensemble size to build.
- correctedbool, optional
If True, compute the corrected GCV estimate.
- typestr, optional
The type of GCV or GCV estimate to compute. It can be either ‘full’ or ‘union’ for naive GCV, and ‘full’ or ‘ovlp’ for CGCV.
- return_dfbool, optional
If True, returns the results as a pandas.DataFrame object.
- n_jobsint, optional
The number of jobs to run in parallel. If -1, all CPUs are used.
- X_test, Y_testnumpy.array, optional
The validation samples. It may be useful to be used for comparing the performance of ECV with other cross-validation methods that requires sample-splitting.
- kwargs_estdict, optional
Additional keyword arguments for the risk estimate.
- sklearn_ensemble_cv.cross_validation.KFoldCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#
Sample-split cross-validation for ensemble models.
- Parameters:
- X_train, Y_trainnumpy.array
The training samples.
- regrobject
The base estimator to use for the ensemble model.
- grid_regrpandas.DataFrame
The grid of hyperparameters for the base estimator.
- grid_ensemblepandas.DataFrame
The grid of hyperparameters for the ensemble model.
- kwargs_regrdict, optional
Additional keyword arguments for the base estimator.
- kwargs_ensembledict, optional
Additional keyword arguments for the ensemble model.
- Mint, optional
The ensemble size to build.
- return_dfbool, optional
If True, returns the results as a pandas.DataFrame object.
- n_jobsint, optional
The number of jobs to run in parallel. If -1, all CPUs are used.
- X_test, Y_testnumpy.array, optional
The test samples. It may be useful to be used for comparing the performance of different cross-validation methods.
- kwargs_estdict, optional
Additional keyword arguments for the risk estimate.
- kwargsdict, optional
Additional keyword arguments for KFold; see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html for more details.
- sklearn_ensemble_cv.cross_validation.comp_empirical_ecv(X_train, Y_train, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#
Compute the empirical ECV estimate for a given ensemble model.
- Parameters:
- X_train, Y_trainnumpy.array
The training samples.
- regrobject
The base estimator to use for the ensemble model.
- kwargs_regrdict, optional
Additional keyword arguments for the base estimator.
- kwargs_ensembledict, optional
Additional keyword arguments for the ensemble model.
- Mint, optional
The maximum ensemble size to consider.
- M0int, optional
The number of estimators to use for the ECV estimate.
- M_maxint, optional
The maximum ensemble size to consider for the tuned ensemble.
- n_jobsint, optional
The number of jobs to run in parallel. If -1, all CPUs are used.
- X_test, Y_testnumpy.array, optional
The test samples.
- _check_inputbool, optional
If True, check the input arguments.
- kwargs_estdict, optional
Additional keyword arguments for the risk estimate.
- Returns:
- risk_ecvnumpy.array
The empirical ECV estimate.
- sklearn_ensemble_cv.cross_validation.comp_empirical_gcv(X_train, Y_train, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, M0=20, M_max=inf, corrected=True, type='full', n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#
Compute the empirical GCV or CGCV estimate for a given ensemble model.
- Parameters:
- X_train, Y_trainnumpy.array
The training samples.
- regrobject
The base estimator to use for the ensemble model.
- kwargs_regrdict, optional
Additional keyword arguments for the base estimator.
- kwargs_ensembledict, optional
Additional keyword arguments for the ensemble model.
- Mint, optional
The maximum ensemble size to consider.
- correctedbool, optional
If True, compute the corrected GCV estimate.
- typestr, optional
The type of GCV or GCV estimate to compute. It can be either ‘full’ or ‘union’ for naive GCV, and ‘full’ or ‘ovlp’ for CGCV.
- n_jobsint, optional
The number of jobs to run in parallel. If -1, all CPUs are used.
- X_test, Y_testnumpy.array, optional
The test samples.
- _check_inputbool, optional
If True, check the input arguments.
- kwargs_estdict, optional
Additional keyword arguments for the risk estimate.
- Returns:
- risk_ecvnumpy.array
The empirical ECV estimate.
- sklearn_ensemble_cv.cross_validation.comp_empirical_val(X_train, Y_train, X_val, Y_val, regr, kwargs_regr={}, kwargs_ensemble={}, M=20, n_jobs=-1, X_test=None, Y_test=None, _check_input=True, **kwargs_est)#
Compute the empirical ECV estimate for a given ensemble model.
- Parameters:
- X_train, Y_trainnumpy.array
The training samples.
- X_val, Y_valnumpy.array
The validation samples.
- regrobject
The base estimator to use for the ensemble model.
- kwargs_regrdict, optional
Additional keyword arguments for the base estimator.
- kwargs_ensembledict, optional
Additional keyword arguments for the ensemble model.
- Mint, optional
The maximum ensemble size to consider.
- n_jobsint, optional
The number of jobs to run in parallel. If -1, all CPUs are used.
- X_test, Y_testnumpy.array, optional
The test samples.
- _check_inputbool, optional
If True, check the input arguments.
- kwargs_estdict, optional
Additional keyword arguments for the risk estimate.
- Returns:
- risk_ecvnumpy.array
The empirical ECV estimate.
- sklearn_ensemble_cv.cross_validation.splitCV(X_train, Y_train, regr, grid_regr={}, grid_ensemble={}, kwargs_regr={}, kwargs_ensemble={}, M=20, return_df=False, n_jobs=-1, X_test=None, Y_test=None, kwargs_est={}, **kwargs)#
Sample-split cross-validation for ensemble models.
- Parameters:
- X_train, Y_trainnumpy.array
The training samples.
- regrobject
The base estimator to use for the ensemble model.
- grid_regrpandas.DataFrame
The grid of hyperparameters for the base estimator.
- grid_ensemblepandas.DataFrame
The grid of hyperparameters for the ensemble model.
- kwargs_regrdict, optional
Additional keyword arguments for the base estimator.
- kwargs_ensembledict, optional
Additional keyword arguments for the ensemble model.
- Mint, optional
The ensemble size to build.
- return_dfbool, optional
If True, returns the results as a pandas.DataFrame object.
- n_jobsint, optional
The number of jobs to run in parallel. If -1, all CPUs are used.
- X_test, Y_testnumpy.array, optional
The test samples. It may be useful to be used for comparing the performance of different cross-validation methods.
- kwargs_estdict, optional
Additional keyword arguments for the risk estimate.
- kwargsdict, optional
Additional keyword arguments for ShuffleSplit; see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.ShuffleSplit.html for more details.