pyESD Documentation
PyESD: More information abou the package that would be part of the metadata
Submodules
pyESD.StationOperator module
Created on Sun Nov 21 00:55:37 2021
@author: dboateng
- class pyESD.StationOperator.StationOperator(data, name, lat, lon, elevation)[source]
Bases:
object
- climate_score(variable, fit_period, score_period, predictor_dataset, **predictor_kwargs)[source]
Calculate the climate score of a fitted model for the given variable.
- Parameters:
variable (string) – Variable name. “Temperature” or “Precipitation”
fit_period (pd.DatetimeIndex) – Range of data that should will be used for creating the reference prediction.
score_period (pd.DatetimeIndex) – Range of data for that the prediction score is evaluated
predictor_dataset (stat_downscaling_tools.Dataset) – The dataset that should be used to calculate the predictors
predictor_kwargs (keyword arguments) – These arguments are passed to the predictor’s get function
- Returns:
cscore – Climate score (similar to rho squared). 1 for perfect fit, 0 for no skill, negative for even worse skill than mean prediction.
- Return type:
double
- cross_validate_and_predict(variable, daterange, predictor_dataset, fit_predictand=True, return_cv_scores=False, **predictor_kwargs)[source]
- fit(variable, daterange, predictor_dataset, fit_predictors=True, predictor_selector=True, selector_method='Recursive', selector_regressor='Ridge', num_predictors=None, selector_direction=None, cal_relative_importance=False, fit_predictand=True, impute=False, impute_method=None, impute_order=None, **predictor_kwargs)[source]
- get_explained_variance(variable)[source]
If the model is fitted and has the attribute
explained_variance
, returns it, otherwise returns an array of zeros.
- predict(variable, daterange, predictor_dataset, fit_predictand=True, fit_predictors=True, **predictor_kwargs)[source]
- predictor_correlation(variable, daterange, predictor_dataset, fit_predictors=True, fit_predictand=True, method='pearson', use_scipy=False, **predictor_kwargs)[source]
- save(directory=None, fname=None)[source]
Saves the weatherstation object to a file (pickle).
- Parameters:
directory (str, optional (default : None)) – Directory name where the pickle-file should be stored. Defaults to the current directory.
fname (str, optional (default: None)) – Filename of the file where the station should be stored. Defaults to
self.name.replace(' ', '_') + '.pickle'
.
- set_model(variable, method, ensemble_learning=False, estimators=None, cv=10, final_estimator_name=None, daterange=None, predictor_dataset=None, fit_predictors=True, scoring=['r2', 'neg_root_mean_squared_error'], **predictor_kwargs)[source]
- set_predictors(variable, predictors, cachedir, radius=250, detrending=False, scaling=False, standardizer=None)[source]
pyESD.ESD_utils module
Created on Fri Nov 12 14:02:28 2021
@author: dboateng This routine contians all the utility classes and functions required for ESD functions
- pyESD.ESD_utils.ComputeStat(i, sx, y, sy, test, return_score=True)[source]
This is part of StatTest, but for parmap.map to work, it has to be an independent function.
- class pyESD.ESD_utils.MidpointNormalize(vmin=None, vmax=None, midpoint=None, clip=False)[source]
Bases:
Normalize
At the moment its a bug to use divergence colormap and set the colorbar range midpoint to zero if both vmax and vmin has different magnitude. This might be possible in future development in matplotlib through colors.offsetNorm(). This class was original developed by Joe Kingto and modified by Daniel Boateng. It sets the divergence color bar to a scale of 0-1 by dividing the midpoint to 0.5 Use this class at your own risk since its non-standard practice for quantitative data.
- Parameters:
vmin (float or None) – If vmin and/or vmax is not given, they are initialized from the minimum and maximum value, respectively, of the first input processed; i.e.,
__call__(A)
callsautoscale_None(A)
.vmax (float or None) – If vmin and/or vmax is not given, they are initialized from the minimum and maximum value, respectively, of the first input processed; i.e.,
__call__(A)
callsautoscale_None(A)
.clip (bool, default: False) –
If
True
values falling outside the range[vmin, vmax]
, are mapped to 0 or 1, whichever is closer, and masked values are set to 1. IfFalse
masked values remain masked.Clipping silently defeats the purpose of setting the over, under, and masked colors in a colormap, so it is likely to lead to surprises; therefore the default is
clip=False
.
Notes
Returns 0 if
vmin == vmax
.
- pyESD.ESD_utils.StackArray(x, dim)[source]
return stacked array with only one dimension left INPUTS:
x : xarray.DataArray or Dataset to be stacked dim: sole dimension to remain after stacking
- OUTPUTS:
stacked: same as x, but stacked
- pyESD.ESD_utils.StatTest(x, y, test, dim=None, parallel=False)[source]
Compute statistical test for significance between two xr.DataArrays. Testing will be done along dimension with name dim and the output p-value will have all dimensions except dim. INPUTS:
x : xr.DataArray for testing. y : xr.DataArray or scalar for testing against. Or None for single-ensemble sign test. dim: dimension name along which to perform the test. test:which test to use:
‘KS’ -> Kolmogorov-Smirnov ‘MW’ -> Mann-Whitney ‘WC’ -> Wilcoxon ‘T’ -> T-test 1 sample with y=mean ‘sign’->test against sign only.
parallel: Run the test in parallel? Requires the parmap package.
- OUTPUTS:
- pvalx: xr.DataArray containing the p-values.
Same dimension as x,y except dim.
- pyESD.ESD_utils._get_month(npdatetime64)[source]
Returns the month for a given npdatetime64 object, 1 for January, 2 for February, …
- pyESD.ESD_utils.load_all_stations(varname, path, stationnames)[source]
This assumes that the stored quantity is a dictionary
Returns a dictionary
- pyESD.ESD_utils.plot_background(p, domain=None, ax=None, left_labels=True, bottom_labels=True, plot_coastlines=True, plot_borders=False)[source]
This funtion defines the plotting domain and also specifies the background. It requires the plot handle from xarray.plot.imshow and other optional arguments :param p: :type p: TYPE: plot handle :param DESCRIPTION: :type DESCRIPTION: the plot handle after plotting with xarray.plot.imshow :param domian = TYPE: :type domian = TYPE: str :param DESCRIPTION: “South America”, “Alaska”, “Tibet Plateau” or “Himalaya”, “Eurosia”,
“New Zealand”, default: global
- pyESD.ESD_utils.plot_ks_stats(data, cmap, ax=None, vmax=None, vmin=None, levels=None, domain=None, center=True, output_name=None, output_format=None, level_ticks=None, title=None, path_to_store=None, left_labels=True, bottom_labels=True, add_colorbar=True, hatches=None, fig=None, cbar_pos=None, use_colorbar_default=False, orientation='horizontal', plot_projection=None, plot_stats=True, stats_results=None)[source]
- Return type:
None.
pyESD.Predictor_Base module
Created on Fri Nov 12 14:02:45 2021
@author: dboateng
pyESD.Predictor_Generator module
Created on Fri Nov 12 14:03:09 2021
@author: dboateng
pyESD.Weatherstation module
Created on Fri Nov 12 14:01:43 2021
This routine handles the preprocessing of data downloaded directly from DWD. The default time series is monthly, others frequency must be pass to the function 1. Extracting only stations with required number of years 2. Writing additional information into files (eg. station name, lat, lon and elevation), since its downloaded into a separate file using station codes 3. All utils function to read stations into pyESD Station operator class
Note: This routine is specifically designed for data downloded from DWD (otherwise please contact daniel.boateng@uni-tuebingen.de for assistance on other datasets)
@author: dboateng
- pyESD.Weatherstation.read_station_csv(filename, varname, return_all=False)[source]
- Parameters:
filename (TYPE: str) – DESCRIPTION. Name of the station in path
varname (TYPE: str) – DESCRIPTION. The name of the varibale to downscale (eg. Precipitation, Temperature)
- Raises:
ValueError – DESCRIPTION.
- Returns:
ws – DESCRIPTION.
- Return type:
TYPE
- pyESD.Weatherstation.read_weatherstationnames(path_to_data)[source]
This function reads all the station names in the data directory
- Parameters:
path_to_data (TYPE: str) – DESCRIPTION. The directory path to where all the station data are stored
- Returns:
namedict – DESCRIPTION.
- Return type:
TYPE: dict
- pyESD.Weatherstation.read_weatherstations(path_to_data)[source]
Read all the station data in a directory.
- Parameters:
path_to_data (TYPE: STR) – DESCRIPTION. relative or absolute path to the station folder
- Returns:
stations – DESCRIPTION. Dictionary containing all the datasets
- Return type:
TYPE: DICT
pyESD.dense_models module
Created on Wed Mar 16 12:26:01 2022
@author: dboateng
This module require further development of add deep learning models!
pyESD.ensemble_models module
Created on Mon Mar 14 11:02:35 2022
@author: dboateng
pyESD.feature_selection module
Created on Mon Jan 3 17:18:14 2022
@author: dboateng
- class pyESD.feature_selection.RecursiveFeatureElimination(regressor_name='ARD')[source]
Bases:
object
- class pyESD.feature_selection.SequentialFeatureSelection(regressor_name='Ridge', n_features=10, direction='forward')[source]
Bases:
object
pyESD.metrics module
Created on Wed Mar 16 11:34:25 2022
@author: dboateng
pyESD.models module
Created on Thu Jan 25 16:00:11 2022
@author: dboateng
- class pyESD.models.HyperparameterOptimize(method, param_grid, regressor, scoring='r2', cv=10)[source]
Bases:
MetaAttributes
pyESD.plot module
Created on Wed Mar 16 11:34:25 2022
@author: dboateng
- pyESD.plot.barplot(methods, stationnames, path_to_data, ax=None, xlabel=None, ylabel=None, varname='test_r2', varname_std='test_r2_std', filename='validation_score_', legend=True, fig_path=None, fig_name=None, show_error=False, width=0.5, rot=0, use_id=True)[source]
- pyESD.plot.boxplot(regressors, stationnames, path_to_data, ax=None, xlabel=None, ylabel=None, varname='test_r2', filename='validation_score_', fig_path=None, fig_name=None, colors=None, patch_artist=False, rot=45)[source]
- pyESD.plot.correlation_heatmap(data, cmap, ax=None, vmax=None, vmin=None, center=0, cbar_ax=None, add_cbar=True, title=None, label='Correlation Coefficinet', fig_path=None, fig_name=None, xlabel=None, ylabel=None, fig=None)[source]
- pyESD.plot.heatmaps(data, cmap, label=None, title=None, vmax=None, vmin=None, center=None, ax=None, cbar=True, cbar_ax=None, xlabel=None)[source]
- pyESD.plot.lineplot(station_num, stationnames, path_to_data, filename, ax=None, fig=None, obs_train_name='obs 1958-2010', obs_test_name='obs 2011-2020', val_predict_name='ERA5 1958-2010', test_predict_name='ERA5 2011-2020', obs_full_name='obs anomalies', method='Stacking', ylabel=None, xlabel=None, fig_path=None, fig_name=None)[source]
- pyESD.plot.plot_monthly_mean(means, stds, color, ylabel=None, ax=None, fig_path=None, fig_name=None, lolims=False)[source]
- pyESD.plot.plot_projection_comparison(stationnames, path_to_data, filename, id_name, method, stationloc_dir, daterange, datasets, variable, dataset_varname, ax=None, xlabel=None, ylabel=None, legend=True, figpath=None, figname=None, width=0.5, title=None, vmax=None, vmin=None, use_id=True)[source]
- pyESD.plot.plot_time_series(stationnames, path_to_data, filename, id_name, daterange, color, label, ymax=None, ymin=None, ax=None, ylabel=None, xlabel=None, fig_path=None, fig_name=None, method='Stacking', window=12)[source]
- pyESD.plot.scatterplot(station_num, stationnames, path_to_data, filename, ax=None, obs_train_name='obs 1958-2010', obs_test_name='obs 2011-2020', val_predict_name='ERA5 1958-2010', test_predict_name='ERA5 2011-2020', obs_full_name='obs anomalies', method='Stacking', ylabel=None, xlabel=None, fig_path=None, fig_name=None, train_marker='*', test_marker='o', train_color='black', test_color='blue')[source]
pyESD.plot_utils module
Created on Mon Apr 11 09:03:49 2022
@author: dboateng
- pyESD.plot_utils.apply_style(fontsize=20, style=None, linewidth=2, usetex=True)[source]
- Parameters:
fontsize (TYPE, optional) – DESCRIPTION. The default is 10.
style (TYPE, optional) – DESCRIPTION. The default is “bmh”. [“seaborn”, “fivethirtyeight”,]
- Return type:
None.
- pyESD.plot_utils.barplot_data(methods, stationnames, path_to_data, varname='test_r2', varname_std='test_r2_std', filename='validation_score_', use_id=False)[source]
- pyESD.plot_utils.boxplot_data(regressors, stationnames, path_to_data, filename='validation_score_', varname='test_r2')[source]
- pyESD.plot_utils.correlation_data(stationnames, path_to_data, filename, predictors, use_id=False, use_scipy=False)[source]
- pyESD.plot_utils.count_predictors(methods, stationnames, path_to_data, filename, predictors)[source]
- pyESD.plot_utils.extract_comparison_data_means(stationnames, path_to_data, filename, id_name, method, stationloc_dir, daterange, datasets, variable, dataset_varname, use_id=True)[source]
- pyESD.plot_utils.extract_time_series(stationnames, path_to_data, filename, id_name, method, daterange)[source]
- pyESD.plot_utils.monthly_mean(stationnames, path_to_data, filename, daterange, id_name, method, use_id=False)[source]
- pyESD.plot_utils.prediction_example_data(station_num, stationnames, path_to_data, filename, obs_train_name='obs 1958-2010', obs_test_name='obs 2011-2020', val_predict_name='ERA5 1958-2010', test_predict_name='ERA5 2011-2020', method='Stacking', use_cv_all=False, obs_full_name='obs anomalies')[source]
pyESD.predictand module
Created on Sun Nov 21 00:55:22 2021
@author: dboateng
- class pyESD.predictand.PredictandTimeseries(data, transform=None, standardizer=None)[source]
Bases:
object
- climate_score(fit_period, score_period, predictor_dataset, **predictor_kwargs)[source]
How much better the prediction for the given period is then the annual mean.
- Parameters:
fit_period (pd.DatetimeIndex) – Range of data that should will be used for creating the reference prediction.
score_period (pd.DatetimeIndex) – Range of data for that the prediction score is evaluated
predictor_dataset (stat_downscaling_tools.Dataset) – The dataset that should be used to calculate the predictors
predictor_kwargs (keyword arguments) – These arguments are passed to the predictor’s get function
- Returns:
cscore – Climate score (similar to rho squared). 1 for perfect fit, 0 for no skill, negative for even worse skill than mean prediction.
- Return type:
double
- cross_validate_and_predict(daterange, predictor_dataset, fit_predictand=True, return_cv_scores=False, **predictor_kwargs)[source]
- fit(daterange, predictor_dataset, fit_predictors=True, predictor_selector=True, selector_method='Recursive', selector_regressor='Ridge', num_predictors=None, selector_direction=None, cal_relative_importance=False, fit_predictand=True, impute=False, impute_method=None, impute_order=None, **predictor_kwargs)[source]
- predict(daterange, predictor_dataset, fit_predictors=True, fit_predictand=True, **predictor_kwargs)[source]
- predictor_correlation(daterange, predictor_dataset, fit_predictors=True, fit_predictand=True, method='pearson', use_scipy=False, **predictor_kwargs)[source]
- set_model(method, ensemble_learning=False, estimators=None, cv=10, final_estimator_name=None, daterange=None, predictor_dataset=None, fit_predictors=True, scoring=['r2', 'neg_root_mean_squared_error'], MLR_learning=False, **predictor_kwargs)[source]
pyESD.splitter module
Created on Tue Jan 25 16:52:13 2022
@author: dboateng
- class pyESD.splitter.MonthlyBooststrapper(n_splits=500, test_size=0.1, block_size=12)[source]
Bases:
object
- class pyESD.splitter.YearlyBootstrapper(n_splits=500, test_size=0.3333333333333333, min_month_per_year=9)[source]
Bases:
object
Splits data in training and test set by picking complete years. You can use it like this:
X = ... y = ... yb = YearlyBootstrapper(10) for i, (train, test) in enumerate(yb.split(X, y)): X_train, y_train = X.iloc[train], y.iloc[train] X_test, y_test = X.iloc[test], y.iloc[test] ...
- Parameters:
n_splits (int (optional, default: 500)) – number of splits
test_size (float (optional, default: 1/3)) – Ratio of test years.
min_month_per_year (int (optional, default: 9)) – minimum number of months that must be available in a year to use this year in the test set.
- split(X, y, groups=None)[source]
Returns
n_splits
pairs of indices to training and test set.- Parameters:
X (pd.DataFrame) –
y (pd.Series) –
groups (dummy) –
index. (X and y should both have the same DatetimeIndex as) –
- Returns:
train (array of ints) – Array of indices of training data
test (array of ints) – Array of indices of test data
pyESD.standardizer module
Created on Sun Nov 21 00:55:02 2021
@author: dboateng
- class pyESD.standardizer.MonthlyStandardizer(detrending=False, scaling=False)[source]
Bases:
BaseEstimator
,TransformerMixin
Standardizes monthly data that has a seasonal cycle and possibly a linear trend.
Since the seasonal cycle might affect the trend estimation, the seasonal cycle is removed first (by subtracting the mean annual cycle) and the trend is estimated by linear regression. Afterwards the data is scaled to variance 1.
- Parameters:
detrending (bool, optional (default: False)) – Whether to remove a linear trend
- fit(X, y=None)[source]
Fits the standardizer to the provided data, i.e. calculates annual mean cycle and trends.
- Parameters:
X (pd.DataFrame or pd.Series) – DataFrame or Series which holds the data
y (dummy (optional, default: None)) – Not used
- Return type:
self
- inverse_transform(X, y=None)[source]
De-standardizes the values based on the previously calculated parameters :param X: DataFrame or Series which holds the standardized data :type X: pd.DataFrame or pd.Series :param y: Not used :type y: dummy (optional, default: None)
- Returns:
X_unstandardized – Unstandardized data
- Return type:
pd.DataFrame or pd.Series
- transform(X, y=None)[source]
Standardizes the values based on the previously calculated parameters :param X: DataFrame or Series which holds the data :type X: pd.DataFrame or pd.Series :param y: Not used :type y: dummy (optional, default: None)
- Returns:
X_transformed – Transformed data
- Return type:
pd.DataFrame or pd.Series
- class pyESD.standardizer.NoStandardizer[source]
Bases:
BaseEstimator
,TransformerMixin
This is just a dummy standardizer that does nothing.
- class pyESD.standardizer.PCAScaling(n_components=None, kernel='linear', method=None)[source]
Bases:
TransformerMixin
,BaseEstimator
- fit_transform(X)[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.
- Returns:
X_new – Transformed array.
- Return type:
ndarray array of shape (n_samples, n_features_new)
- class pyESD.standardizer.StandardScaling(method=None, with_std=True, with_mean=True, unit_variance=False, norm='l2')[source]
Bases:
BaseEstimator
,TransformerMixin
- pyESD.standardizer.add_seasonal_cycle(t, anomalies, mean)[source]
Adds a seasonal cycle such that
X = anomalies + mean_seasonal_cycle
- Parameters:
t (numpy array of ints) – time in number of months
res (numpy array) – Array of standardized values
mean (array of shape 12 x #columns(res)) – Mean values for each month and each column in res
- Returns:
X
- Return type:
unstandardized values
pyESD.teleconnections module
Created on Mon Mar 14 16:58:59 2022
@author: dboateng
- pyESD.teleconnections._get_month(npdatetime64)[source]
Returns the month for a given npdatetime64 object, 1 for January, 2 for February, …
pyESD.data_preprocessing_utils module
pyESD.MLR module
Created on Mon Nov 7 17:28:48 2022
@author: dboateng
This module contains the regression routines. There are three layers for bootstrapped forward selection regression:
The
BootstrappedRegression
class is the outer layer. This implements the bootstrapping loop. This class has “regressor” member that implements the single regression step (i.e. a fit and a predict method). This can be the aForwardSelection
object, but can also beLasso
from sklearn or similar routines.The
ForwardSelection
class is the next layer. This class implements a Forward Selection loop. This again has a regressor object that has to implementget_coefs
,set_coefs
, andaverage_coefs
. Additionally the regressor object has to implementfit_active
,fit
, andpredict
. An example of such a regressor object isMultipleLSRegression
.
- class pyESD.MLR_model.BootstrappedForwardSelection(regressor, min_explained_variance=0.02, cv=None)[source]
Bases:
BootstrappedRegression
This is an easy to use interface for BootstrappedRegression with ForwardSelection.
- Parameters:
regressor (regression object) – This should be an object similar to sklearn-like regressors that provides the methods fit and predict. Furthermore, it must also provide the methods
get_coefs
,set_coefs
,average_coefs
, andfit_active
. An example of this isMultipleLSRegression
below.min_explained_variance (float, optional (default: 0.02)) – If inclusion of the staged predictor doesn’t improve the explained variance on the test set by at least this amount, stop the selection process.
cv (integer or cross-validation generator (optional, default: None)) –
This determines how the data are split:
If
cv=None
, 3-fold cross-validation will be used.If
cv=n
wheren
is an integer, n-fold cross-validation will be used.If
cv=some_object
, wheresome_object
implements asome_object.split(X, y)
method that returns indices for training and test set, this will be used. It is recommended to useYearlyBootstrapper()
fromstat_downscaling_tools.bootstrap
.
- class pyESD.MLR_model.BootstrappedRegression(regressor, cv=None)[source]
Bases:
MetaEstimator
Performs a regression in a bootstrapping loop.
This splits the data multiple times into training and test data and performs a regression for each split. In each loop the calculated parameters are stored. The final model uses the average of all predictors. If the model is a
LinearModel
from sklearn (i.e. it has the attributescoef_
andintercept_
), the averaging routine does not have to be implemented. However, it can be implemented if something else than a arithmetic mean should be used (e.g. if only the average of robust predictors should be taken and everything else should be set to zero).Since this inherites from sklearn modules, it can to some extent be used interchangibly with other sklearn regressors.
- Parameters:
regressor (regression object) – This should be an object similar to sklearn-like regressors that provides the methods
fit(self, X_train, y_train, X_test, y_test)
andpredict(self, X)
. This must also provide the methodsget_coefs(self)
,set_coefs(self, coefs)
, andaverage_coefs(self, list_of_coefs)
. An example of this isForwardSelection
below. The regressor can also have a member variableadditional_results
, which should be a dictionary of parameters that are calculated during fitting but not needed for predicting, for example metrics like the explained variance of predictors. In this case the regressor also needs the methodaverage_additional_results(self, list_of_dicts)
andset_additional_results(self, mean_additional_results)
.cv (integer or cross-validation generator (optional, default: None)) –
This determines how the data are split:
If
cv=None
, 3-fold cross-validation will be used.If
cv=n
wheren
is an integer, n-fold cross-validation will be used.If
cv=some_object
, wheresome_object
implements asome_object.split(X, y)
method that returns indices for training and test set, this will be used. It is recommended to useYearlyBootstrapper()
fromstat_downscaling_tools.bootstrap
.
- Variables:
mean_coefs (type and shape depends on regressor, (only after fitting)) – Fitted coefficients (mean of all models where the coefficients were nonzero).
cv_error (float (only after fitting)) – Mean of errors on test sets during bootstrapping loop.
coef_, (If the regressor object has the attributes intercept_ and) –
here. (these will also be set) –
- class pyESD.MLR_model.ForwardSelection(regressor, min_explained_variance=0.02)[source]
Bases:
MetaEstimator
Performs a forward selection regression.
This stepwise selects the next most promising candidate predictor and adds it to the model if it is good enough. The method is outlined in “Statistical Analysis in Climate Research” (von Storch, 1999).
Since this object is intended to be used in the BootstrappedRegression class, it implements all necessary methods.
- Parameters:
regressor (regression object) – This should be an object similar to sklearn-like regressors that provides the methods fit and predict. Furthermore, it must also provide the methods
get_coefs
,set_coefs
,average_coefs
, andfit_active
. An example of this isMultipleLSRegression
below.min_explained_variance (float, optional (default: 0.02)) – If inclusion of the staged predictor doesn’t improve the explained variance on the test set by at least this amount, stop the selection process.
- Variables:
explaned_variances (numpy array) –
- fit(X_train, y_train, X_test, y_test)[source]
Cross-validated forward selection. This fits a regression model according to the following algorithm:
Start with yhat = mean(y), res = y - yhat, active = []
- for each predictor in inactive set:
add to active set
perform regression
get error and uncertainty of error (standard deviation)
remove from active set
add predictor with lowest error on test set to active set
if improvement was not good enough, abort and use previous model.
- Parameters:
X_train (numpy array of shape #samples x #predictors) – Array that holds the values of the predictors (columns) at different times (rows) for the training dataset.
y_train (numpy array of length #samples) – Training predictand data
X_test (numpy array of shape #samples x #predictors) – Test predictor data
y_test (numpy array of length #samples) – Test predictand data
- Returns:
exp_var – explained variance of each predictor
- Return type:
numpy array of length #predictors