pyESD Documentation

PyESD: More information abou the package that would be part of the metadata

Submodules

pyESD.StationOperator module

Created on Sun Nov 21 00:55:37 2021

@author: dboateng

class pyESD.StationOperator.StationOperator(data, name, lat, lon, elevation)[source]

Bases: object

climate_score(variable, fit_period, score_period, predictor_dataset, **predictor_kwargs)[source]

Calculate the climate score of a fitted model for the given variable.

Parameters:

variable (string) – Variable name. “Temperature” or “Precipitation”
fit_period (pd.DatetimeIndex) – Range of data that should will be used for creating the reference prediction.
score_period (pd.DatetimeIndex) – Range of data for that the prediction score is evaluated
predictor_dataset (stat_downscaling_tools.Dataset) – The dataset that should be used to calculate the predictors
predictor_kwargs (keyword arguments) – These arguments are passed to the predictor’s get function

Returns:

cscore – Climate score (similar to rho squared). 1 for perfect fit, 0 for no skill, negative for even worse skill than mean prediction.

Return type:

double

cross_validate_and_predict(variable, daterange, predictor_dataset, fit_predictand=True, return_cv_scores=False, **predictor_kwargs)[source]

ensemble_transform(variable, daterange, predictor_dataset, **predictor_kwargs)[source]

evaluate(variable, daterange, predictor_dataset, fit_predictand=True, **predictor_kwargs)[source]

fit(variable, daterange, predictor_dataset, fit_predictors=True, predictor_selector=True, selector_method='Recursive', selector_regressor='Ridge', num_predictors=None, selector_direction=None, cal_relative_importance=False, fit_predictand=True, impute=False, impute_method=None, impute_order=None, **predictor_kwargs)[source]

fit_predictor(variable, name, daterange, predictor_dataset)[source]

get_explained_variance(variable)[source]: If the model is fitted and has the attribute explained_variance, returns it, otherwise returns an array of zeros.

get_var(variable, daterange, anomalies=True)[source]

predict(variable, daterange, predictor_dataset, fit_predictand=True, fit_predictors=True, **predictor_kwargs)[source]

predictor_correlation(variable, daterange, predictor_dataset, fit_predictors=True, fit_predictand=True, method='pearson', use_scipy=False, **predictor_kwargs)[source]

relative_predictor_importance(variable)[source]

save(directory=None, fname=None)[source]

Saves the weatherstation object to a file (pickle).

Parameters:

directory (str, optional (default : None)) – Directory name where the pickle-file should be stored. Defaults to the current directory.
fname (str, optional (default: None)) – Filename of the file where the station should be stored. Defaults to self.name.replace(' ', '_') + '.pickle'.

selected_names(variable)[source]

set_model(variable, method, ensemble_learning=False, estimators=None, cv=10, final_estimator_name=None, daterange=None, predictor_dataset=None, fit_predictors=True, scoring=['r2', 'neg_root_mean_squared_error'], **predictor_kwargs)[source]

set_predictors(variable, predictors, cachedir, radius=250, detrending=False, scaling=False, standardizer=None)[source]

set_standardizer(variable, standardizer)[source]

set_transform(variable, transform)[source]

tree_based_feature_importance(variable, daterange, predictor_dataset, fit_predictand=True, plot=False, **predictor_kwargs)[source]

tree_based_feature_permutation_importance(variable, daterange, predictor_dataset, fit_predictand=True, plot=False, **predictor_kwargs)[source]

pyESD.StationOperator.load_station(fname)[source]: Loads a pickled station from the given file

pyESD.ESD_utils module

Created on Fri Nov 12 14:02:28 2021

@author: dboateng This routine contians all the utility classes and functions required for ESD functions

pyESD.ESD_utils.ComputeStat(i, sx, y, sy, test, return_score=True)[source]: This is part of StatTest, but for parmap.map to work, it has to be an independent function.

class pyESD.ESD_utils.Dataset(name, variables, domain_name)[source]

Bases: object

get(varname, select_domain=True, is_Dataset=False)[source]

class pyESD.ESD_utils.MidpointNormalize(vmin=None, vmax=None, midpoint=None, clip=False)[source]

Bases: Normalize

At the moment its a bug to use divergence colormap and set the colorbar range midpoint to zero if both vmax and vmin has different magnitude. This might be possible in future development in matplotlib through colors.offsetNorm(). This class was original developed by Joe Kingto and modified by Daniel Boateng. It sets the divergence color bar to a scale of 0-1 by dividing the midpoint to 0.5 Use this class at your own risk since its non-standard practice for quantitative data.

Parameters:

vmin (float or None) – If vmin and/or vmax is not given, they are initialized from the minimum and maximum value, respectively, of the first input processed; i.e., __call__(A) calls autoscale_None(A).
vmax (float or None) – If vmin and/or vmax is not given, they are initialized from the minimum and maximum value, respectively, of the first input processed; i.e., __call__(A) calls autoscale_None(A).
clip (bool, default: False) –
If True values falling outside the range [vmin, vmax], are mapped to 0 or 1, whichever is closer, and masked values are set to 1. If False masked values remain masked.

Clipping silently defeats the purpose of setting the over, under, and masked colors in a colormap, so it is likely to lead to surprises; therefore the default is clip=False.

Notes

Returns 0 if vmin == vmax.

pyESD.ESD_utils.StackArray(x, dim)[source]

return stacked array with only one dimension left INPUTS:

x : xarray.DataArray or Dataset to be stacked dim: sole dimension to remain after stacking

OUTPUTS:: stacked: same as x, but stacked

pyESD.ESD_utils.StatTest(x, y, test, dim=None, parallel=False)[source]

Compute statistical test for significance between two xr.DataArrays. Testing will be done along dimension with name dim and the output p-value will have all dimensions except dim. INPUTS:

x : xr.DataArray for testing. y : xr.DataArray or scalar for testing against. Or None for single-ensemble sign test. dim: dimension name along which to perform the test. test:which test to use:

‘KS’ -> Kolmogorov-Smirnov ‘MW’ -> Mann-Whitney ‘WC’ -> Wilcoxon ‘T’ -> T-test 1 sample with y=mean ‘sign’->test against sign only.

parallel: Run the test in parallel? Requires the parmap package.

OUTPUTS:

pvalx: xr.DataArray containing the p-values.: Same dimension as x,y except dim.

pyESD.ESD_utils._get_month(npdatetime64)[source]: Returns the month for a given npdatetime64 object, 1 for January, 2 for February, …

pyESD.ESD_utils.extract_indices_around(dataset, lat, lon, radius)[source]

pyESD.ESD_utils.extract_region(data, datarange, varname, minlat, maxlat, minlon, maxlon)[source]

pyESD.ESD_utils.haversine(lon1, lat1, lon2, lat2)[source]

pyESD.ESD_utils.levene_test()[source]

pyESD.ESD_utils.load_all_stations(varname, path, stationnames)[source]

This assumes that the stored quantity is a dictionary

Returns a dictionary

pyESD.ESD_utils.load_csv(stationname, varname, path)[source]

pyESD.ESD_utils.load_pickle(stationname, varname, path)[source]

pyESD.ESD_utils.map_to_xarray(X, datarray)[source]

pyESD.ESD_utils.plot_background(p, domain=None, ax=None, left_labels=True, bottom_labels=True, plot_coastlines=True, plot_borders=False)[source]

This funtion defines the plotting domain and also specifies the background. It requires the plot handle from xarray.plot.imshow and other optional arguments :param p: :type p: TYPE: plot handle :param DESCRIPTION: :type DESCRIPTION: the plot handle after plotting with xarray.plot.imshow :param domian = TYPE: :type domian = TYPE: str :param DESCRIPTION: “South America”, “Alaska”, “Tibet Plateau” or “Himalaya”, “Eurosia”,

“New Zealand”, default: global

pyESD.ESD_utils.plot_ks_stats(data, cmap, ax=None, vmax=None, vmin=None, levels=None, domain=None, center=True, output_name=None, output_format=None, level_ticks=None, title=None, path_to_store=None, left_labels=True, bottom_labels=True, add_colorbar=True, hatches=None, fig=None, cbar_pos=None, use_colorbar_default=False, orientation='horizontal', plot_projection=None, plot_stats=True, stats_results=None)[source]

Return type:: None.

pyESD.ESD_utils.ranksums_test()[source]

pyESD.ESD_utils.store_csv(stationname, varname, var, cachedir)[source]

pyESD.ESD_utils.store_pickle(stationname, varname, var, cachedir)[source]

pyESD.Predictor_Base module

Created on Fri Nov 12 14:02:45 2021

@author: dboateng

class pyESD.Predictor_Base.Predictor(name, longname=None, cachedir=None)[source]

Bases: ABC

fit(daterange, dataset)[source]

get(daterange, dataset, fit, regenerate=False, patterns_from=None, params_from=None)[source]

load()[source]

plot(daterange, dataset, fit, regenerate=False, patterns_from=None, params_from=None, **plot_kwargs)[source]

save()[source]

pyESD.Predictor_Generator module

Created on Fri Nov 12 14:03:09 2021

@author: dboateng

class pyESD.Predictor_Generator.RegionalAverage(name, lat, lon, standardizer_constructor=None, radius=250, **kwargs)[source]: Bases: Predictor

pyESD.Weatherstation module

Created on Fri Nov 12 14:01:43 2021

This routine handles the preprocessing of data downloaded directly from DWD. The default time series is monthly, others frequency must be pass to the function 1. Extracting only stations with required number of years 2. Writing additional information into files (eg. station name, lat, lon and elevation), since its downloaded into a separate file using station codes 3. All utils function to read stations into pyESD Station operator class

Note: This routine is specifically designed for data downloded from DWD (otherwise please contact daniel.boateng@uni-tuebingen.de for assistance on other datasets)

@author: dboateng

pyESD.Weatherstation.read_station_csv(filename, varname, return_all=False)[source]

Parameters:

filename (TYPE: str) – DESCRIPTION. Name of the station in path
varname (TYPE: str) – DESCRIPTION. The name of the varibale to downscale (eg. Precipitation, Temperature)

Raises:

ValueError – DESCRIPTION.

Returns:

ws – DESCRIPTION.

Return type:

TYPE

pyESD.Weatherstation.read_weatherstationnames(path_to_data)[source]

This function reads all the station names in the data directory

Parameters:: path_to_data (TYPE: str) – DESCRIPTION. The directory path to where all the station data are stored
Returns:: namedict – DESCRIPTION.
Return type:: TYPE: dict

pyESD.Weatherstation.read_weatherstations(path_to_data)[source]

Read all the station data in a directory.

Parameters:: path_to_data (TYPE: STR) – DESCRIPTION. relative or absolute path to the station folder
Returns:: stations – DESCRIPTION. Dictionary containing all the datasets
Return type:: TYPE: DICT

pyESD.dense_models module

Created on Wed Mar 16 12:26:01 2022

@author: dboateng

This module require further development of add deep learning models!

class pyESD.dense_models.DeepLearningRegressor(method=None, optimizer='adam', loss='mean_squared_error', metrics=['RootMeanSquaredError'])[source]

Bases: object

build_model()[source]

compile_model()[source]

convert_to_sklearn_regressor(epochs=1000, verbose=False)[source]

fit(X, y)[source]

plot_network()[source]

predict(X)[source]

pyESD.ensemble_models module

Created on Mon Mar 14 11:02:35 2022

@author: dboateng

class pyESD.ensemble_models.EnsembleRegressor(estimators, final_estimator_name=None, cv=10, n_jobs=-1, passthrough=False, method='Stacking', scoring=None)[source]

Bases: object

cross_val_predict(X, y)[source]

cross_val_score(X, y)[source]

cross_validate(X, y)[source]

fit(X, y)[source]

get_params(deep=True)[source]

predict(X)[source]

predict_average(X)[source]

score(X, y)[source]

transform(X)[source]

pyESD.feature_selection module

Created on Mon Jan 3 17:18:14 2022

@author: dboateng

class pyESD.feature_selection.RecursiveFeatureElimination(regressor_name='ARD')[source]

Bases: object

cv_test_score()[source]

fit(X, y)[source]

print_selected_features(X)[source]

score(X, y)[source]

transform(X)[source]

class pyESD.feature_selection.SequentialFeatureSelection(regressor_name='Ridge', n_features=10, direction='forward')[source]

Bases: object

fit(X, y)[source]

print_selected_features(X)[source]

score(X, y)[source]

transform(X)[source]

class pyESD.feature_selection.TreeBasedSelection(regressor_name='RandomForest')[source]

Bases: object

feature_importance(X, y, plot=False, fig_path=None, fig_name=None, save_fig=False, station_name=None)[source]

fit(X, y)[source]

permutation_importance_(X, y, plot=False, fig_path=None, fig_name=None, save_fig=False)[source]

print_selected_features(X)[source]

transform(X)[source]

pyESD.metrics module

Created on Wed Mar 16 11:34:25 2022

@author: dboateng

class pyESD.metrics.Evaluate(y_true, y_pred)[source]

Bases: object

MAE()[source]

MSE()[source]

NSE()[source]

R2_score()[source]

RMSE()[source]

adjusted_r2()[source]

explained_variance()[source]

max_error()[source]

pyESD.models module

Created on Thu Jan 25 16:00:11 2022

@author: dboateng

class pyESD.models.HyperparameterOptimize(method, param_grid, regressor, scoring='r2', cv=10)[source]

Bases: MetaAttributes

best_estimator()[source]

cross_val_predict(X, y)[source]

cross_val_score(X, y)[source]

cross_validate(X, y)[source]

fit(X, y)[source]

predict_log_proba(X)[source]

score(X, y)[source]

transform(X)[source]

class pyESD.models.MetaAttributes[source]

Bases: object

alpha()[source]

best_estimator()[source]

best_params()[source]

coef()[source]

get_params()[source]

intercept()[source]

set_params(**params)[source]

class pyESD.models.Regressors(method, cv=None, hyper_method=None, scoring=None)[source]

Bases: MetaAttributes

cross_val_predict(X, y)[source]

cross_val_score(X, y)[source]

cross_validate(X, y)[source]

fit(X, y)[source]

predict(X)[source]

score(X, y)[source]

set_model()[source]

pyESD.plot module

Created on Wed Mar 16 11:34:25 2022

@author: dboateng

pyESD.plot.barplot(methods, stationnames, path_to_data, ax=None, xlabel=None, ylabel=None, varname='test_r2', varname_std='test_r2_std', filename='validation_score_', legend=True, fig_path=None, fig_name=None, show_error=False, width=0.5, rot=0, use_id=True)[source]

pyESD.plot.boxplot(regressors, stationnames, path_to_data, ax=None, xlabel=None, ylabel=None, varname='test_r2', filename='validation_score_', fig_path=None, fig_name=None, colors=None, patch_artist=False, rot=45)[source]

pyESD.plot.correlation_heatmap(data, cmap, ax=None, vmax=None, vmin=None, center=0, cbar_ax=None, add_cbar=True, title=None, label='Correlation Coefficinet', fig_path=None, fig_name=None, xlabel=None, ylabel=None, fig=None)[source]

pyESD.plot.heatmaps(data, cmap, label=None, title=None, vmax=None, vmin=None, center=None, ax=None, cbar=True, cbar_ax=None, xlabel=None)[source]

pyESD.plot.lineplot(station_num, stationnames, path_to_data, filename, ax=None, fig=None, obs_train_name='obs 1958-2010', obs_test_name='obs 2011-2020', val_predict_name='ERA5 1958-2010', test_predict_name='ERA5 2011-2020', obs_full_name='obs anomalies', method='Stacking', ylabel=None, xlabel=None, fig_path=None, fig_name=None)[source]

pyESD.plot.plot_monthly_mean(means, stds, color, ylabel=None, ax=None, fig_path=None, fig_name=None, lolims=False)[source]

pyESD.plot.plot_projection_comparison(stationnames, path_to_data, filename, id_name, method, stationloc_dir, daterange, datasets, variable, dataset_varname, ax=None, xlabel=None, ylabel=None, legend=True, figpath=None, figname=None, width=0.5, title=None, vmax=None, vmin=None, use_id=True)[source]

pyESD.plot.plot_time_series(stationnames, path_to_data, filename, id_name, daterange, color, label, ymax=None, ymin=None, ax=None, ylabel=None, xlabel=None, fig_path=None, fig_name=None, method='Stacking', window=12)[source]

pyESD.plot.scatterplot(station_num, stationnames, path_to_data, filename, ax=None, obs_train_name='obs 1958-2010', obs_test_name='obs 2011-2020', val_predict_name='ERA5 1958-2010', test_predict_name='ERA5 2011-2020', obs_full_name='obs anomalies', method='Stacking', ylabel=None, xlabel=None, fig_path=None, fig_name=None, train_marker='*', test_marker='o', train_color='black', test_color='blue')[source]

pyESD.plot_utils module

Created on Mon Apr 11 09:03:49 2022

@author: dboateng

pyESD.plot_utils.apply_style(fontsize=20, style=None, linewidth=2, usetex=True)[source]

Parameters:

fontsize (TYPE, optional) – DESCRIPTION. The default is 10.
style (TYPE, optional) – DESCRIPTION. The default is “bmh”. [“seaborn”, “fivethirtyeight”,]

Return type:

None.

pyESD.plot_utils.barplot_data(methods, stationnames, path_to_data, varname='test_r2', varname_std='test_r2_std', filename='validation_score_', use_id=False)[source]

pyESD.plot_utils.boxplot_data(regressors, stationnames, path_to_data, filename='validation_score_', varname='test_r2')[source]

pyESD.plot_utils.correlation_data(stationnames, path_to_data, filename, predictors, use_id=False, use_scipy=False)[source]

pyESD.plot_utils.count_predictors(methods, stationnames, path_to_data, filename, predictors)[source]

pyESD.plot_utils.extract_comparison_data_means(stationnames, path_to_data, filename, id_name, method, stationloc_dir, daterange, datasets, variable, dataset_varname, use_id=True)[source]

pyESD.plot_utils.extract_time_series(stationnames, path_to_data, filename, id_name, method, daterange)[source]

pyESD.plot_utils.monthly_mean(stationnames, path_to_data, filename, daterange, id_name, method, use_id=False)[source]

pyESD.plot_utils.prediction_example_data(station_num, stationnames, path_to_data, filename, obs_train_name='obs 1958-2010', obs_test_name='obs 2011-2020', val_predict_name='ERA5 1958-2010', test_predict_name='ERA5 2011-2020', method='Stacking', use_cv_all=False, obs_full_name='obs anomalies')[source]

pyESD.plot_utils.resample_monthly(data, daterange)[source]

pyESD.plot_utils.resample_seasonally(data, daterange)[source]

pyESD.plot_utils.seasonal_mean(stationnames, path_to_data, filename, daterange, id_name, method, use_id=False)[source]

pyESD.predictand module

Created on Sun Nov 21 00:55:22 2021

@author: dboateng

class pyESD.predictand.PredictandTimeseries(data, transform=None, standardizer=None)[source]

Bases: object

climate_score(fit_period, score_period, predictor_dataset, **predictor_kwargs)[source]

How much better the prediction for the given period is then the annual mean.

Parameters:

fit_period (pd.DatetimeIndex) – Range of data that should will be used for creating the reference prediction.
score_period (pd.DatetimeIndex) – Range of data for that the prediction score is evaluated
predictor_dataset (stat_downscaling_tools.Dataset) – The dataset that should be used to calculate the predictors
predictor_kwargs (keyword arguments) – These arguments are passed to the predictor’s get function

Returns:

cscore – Climate score (similar to rho squared). 1 for perfect fit, 0 for no skill, negative for even worse skill than mean prediction.

Return type:

double

cross_validate_and_predict(daterange, predictor_dataset, fit_predictand=True, return_cv_scores=False, **predictor_kwargs)[source]

ensemble_transform(daterange, predictor_dataset, **predictor_kwargs)[source]

evaluate(daterange, predictor_dataset, fit_predictand=True, **predictor_kwargs)[source]

fit(daterange, predictor_dataset, fit_predictors=True, predictor_selector=True, selector_method='Recursive', selector_regressor='Ridge', num_predictors=None, selector_direction=None, cal_relative_importance=False, fit_predictand=True, impute=False, impute_method=None, impute_order=None, **predictor_kwargs)[source]

fit_predictor(name, daterange, predictor_dataset)[source]

get(daterange=None, anomalies=True)[source]

predict(daterange, predictor_dataset, fit_predictors=True, fit_predictand=True, **predictor_kwargs)[source]

predictor_correlation(daterange, predictor_dataset, fit_predictors=True, fit_predictand=True, method='pearson', use_scipy=False, **predictor_kwargs)[source]

relative_predictor_importance()[source]

selected_names()[source]

set_model(method, ensemble_learning=False, estimators=None, cv=10, final_estimator_name=None, daterange=None, predictor_dataset=None, fit_predictors=True, scoring=['r2', 'neg_root_mean_squared_error'], MLR_learning=False, **predictor_kwargs)[source]

set_predictors(predictors)[source]

set_standardizer(standardizer)[source]

set_transform(transform)[source]

tree_based_feature_importance(daterange, predictor_dataset, fit_predictand=True, plot=False, **predictor_kwargs)[source]

tree_based_feature_permutation_importance(daterange, predictor_dataset, fit_predictand=True, plot=False, **predictor_kwargs)[source]

pyESD.splitter module

Created on Tue Jan 25 16:52:13 2022

@author: dboateng

class pyESD.splitter.MonthlyBooststrapper(n_splits=500, test_size=0.1, block_size=12)[source]

Bases: object

split(X, y, groups=None)[source]

num_blocks * block_size = test_size*num_samples –> n_blocks = test_size/block_size*num_samples

Parameters:

X (TYPE) – DESCRIPTION.
y (TYPE) – DESCRIPTION.
groups (TYPE, optional) – DESCRIPTION. The default is None.

Return type:

None.

class pyESD.splitter.Splitter(method, shuffle=False, n_splits=5)[source]

Bases: object

get_n_splits(X=None, y=None, groups=None)[source]

split(X, y=None, groups=None)[source]

class pyESD.splitter.YearlyBootstrapper(n_splits=500, test_size=0.3333333333333333, min_month_per_year=9)[source]

Bases: object

Splits data in training and test set by picking complete years. You can use it like this:

X = ...
y = ...
yb = YearlyBootstrapper(10)

for i, (train, test) in enumerate(yb.split(X, y)):
    X_train, y_train = X.iloc[train], y.iloc[train]
    X_test, y_test = X.iloc[test], y.iloc[test]
    ...

Parameters:

n_splits (int (optional, default: 500)) – number of splits
test_size (float (optional, default: 1/3)) – Ratio of test years.
min_month_per_year (int (optional, default: 9)) – minimum number of months that must be available in a year to use this year in the test set.

split(X, y, groups=None)[source]

Returns n_splits pairs of indices to training and test set.

Parameters:

X (pd.DataFrame) –
y (pd.Series) –
groups (dummy) –
index. (X and y should both have the same DatetimeIndex as) –

Returns:

train (array of ints) – Array of indices of training data
test (array of ints) – Array of indices of test data

pyESD.standardizer module

Created on Sun Nov 21 00:55:02 2021

@author: dboateng

class pyESD.standardizer.MonthlyStandardizer(detrending=False, scaling=False)[source]

Bases: BaseEstimator, TransformerMixin

Standardizes monthly data that has a seasonal cycle and possibly a linear trend.

Since the seasonal cycle might affect the trend estimation, the seasonal cycle is removed first (by subtracting the mean annual cycle) and the trend is estimated by linear regression. Afterwards the data is scaled to variance 1.

Parameters:: detrending (bool, optional (default: False)) – Whether to remove a linear trend

fit(X, y=None)[source]

Fits the standardizer to the provided data, i.e. calculates annual mean cycle and trends.

Parameters:

X (pd.DataFrame or pd.Series) – DataFrame or Series which holds the data
y (dummy (optional, default: None)) – Not used

Return type:

self

inverse_transform(X, y=None)[source]

De-standardizes the values based on the previously calculated parameters :param X: DataFrame or Series which holds the standardized data :type X: pd.DataFrame or pd.Series :param y: Not used :type y: dummy (optional, default: None)

Returns:: X_unstandardized – Unstandardized data
Return type:: pd.DataFrame or pd.Series

transform(X, y=None)[source]

Standardizes the values based on the previously calculated parameters :param X: DataFrame or Series which holds the data :type X: pd.DataFrame or pd.Series :param y: Not used :type y: dummy (optional, default: None)

Returns:: X_transformed – Transformed data
Return type:: pd.DataFrame or pd.Series

class pyESD.standardizer.NoStandardizer[source]

Bases: BaseEstimator, TransformerMixin

This is just a dummy standardizer that does nothing.

fit(X, y=None)[source]

inverse_transform(X)[source]

transform(X, y=None)[source]

class pyESD.standardizer.PCAScaling(n_components=None, kernel='linear', method=None)[source]

Bases: TransformerMixin, BaseEstimator

fit(X)[source]

fit_transform(X)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
**fit_params (dict) – Additional fit parameters.

Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

inverse_transform(X)[source]

transform(X)[source]

class pyESD.standardizer.StandardScaling(method=None, with_std=True, with_mean=True, unit_variance=False, norm='l2')[source]

Bases: BaseEstimator, TransformerMixin

fit(X)[source]

inverse_transform(X)[source]

transform(X)[source]

pyESD.standardizer.add_seasonal_cycle(t, anomalies, mean)[source]

Adds a seasonal cycle such that

X = anomalies + mean_seasonal_cycle

Parameters:

t (numpy array of ints) – time in number of months
res (numpy array) – Array of standardized values
mean (array of shape 12 x #columns(res)) – Mean values for each month and each column in res

Returns:

X

Return type:

unstandardized values

pyESD.standardizer.get_annual_mean_cycle(t, X)[source]

pyESD.standardizer.get_mean_prediction(t, mean)[source]

pyESD.standardizer.remove_seasonal_cycle(t, X, mean)[source]: Inverse operation to add_seasonal_cycle

pyESD.teleconnections module

Created on Mon Mar 14 16:58:59 2022

@author: dboateng

class pyESD.teleconnections.EA(**kwargs)[source]

Bases: Predictor

plot_cov_matrix()[source]

class pyESD.teleconnections.EAWR(**kwargs)[source]

Bases: Predictor

plot_cov_matrix()[source]

class pyESD.teleconnections.MEI(**kwargs)[source]: Bases: Predictor

class pyESD.teleconnections.NAO(**kwargs)[source]

Bases: Predictor

plot_cov_matrix()[source]

class pyESD.teleconnections.SCAN(**kwargs)[source]

Bases: Predictor

plot_cov_matrix()[source]

pyESD.teleconnections._get_month(npdatetime64)[source]: Returns the month for a given npdatetime64 object, 1 for January, 2 for February, …

pyESD.teleconnections.eof_analysis(data, neofs, method='eof_package', apply_equal_wtgs=True, pcscaling=1)[source]

pyESD.teleconnections.extract_region(data, datarange, varname, minlat, maxlat, minlon, maxlon)[source]

pyESD.data_preprocessing_utils module

pyESD.MLR module

Created on Mon Nov 7 17:28:48 2022

@author: dboateng

This module contains the regression routines. There are three layers for bootstrapped forward selection regression:

The BootstrappedRegression class is the outer layer. This implements the bootstrapping loop. This class has “regressor” member that implements the single regression step (i.e. a fit and a predict method). This can be the a ForwardSelection object, but can also be Lasso from sklearn or similar routines.
The ForwardSelection class is the next layer. This class implements a Forward Selection loop. This again has a regressor object that has to implement get_coefs, set_coefs, and average_coefs. Additionally the regressor object has to implement fit_active, fit, and predict. An example of such a regressor object is MultipleLSRegression.

class pyESD.MLR_model.BootstrappedForwardSelection(regressor, min_explained_variance=0.02, cv=None)[source]

Bases: BootstrappedRegression

This is an easy to use interface for BootstrappedRegression with ForwardSelection.

Parameters:

regressor (regression object) – This should be an object similar to sklearn-like regressors that provides the methods fit and predict. Furthermore, it must also provide the methods get_coefs, set_coefs, average_coefs, and fit_active. An example of this is MultipleLSRegression below.
min_explained_variance (float, optional (default: 0.02)) – If inclusion of the staged predictor doesn’t improve the explained variance on the test set by at least this amount, stop the selection process.
cv (integer or cross-validation generator (optional, default: None)) –
This determines how the data are split:
- If cv=None, 3-fold cross-validation will be used.
- If cv=n where n is an integer, n-fold cross-validation will be used.
- If cv=some_object, where some_object implements a some_object.split(X, y) method that returns indices for training and test set, this will be used. It is recommended to use YearlyBootstrapper() from stat_downscaling_tools.bootstrap.

class pyESD.MLR_model.BootstrappedRegression(regressor, cv=None)[source]

Bases: MetaEstimator

Performs a regression in a bootstrapping loop.

This splits the data multiple times into training and test data and performs a regression for each split. In each loop the calculated parameters are stored. The final model uses the average of all predictors. If the model is a LinearModel from sklearn (i.e. it has the attributes coef_ and intercept_), the averaging routine does not have to be implemented. However, it can be implemented if something else than a arithmetic mean should be used (e.g. if only the average of robust predictors should be taken and everything else should be set to zero).

Since this inherites from sklearn modules, it can to some extent be used interchangibly with other sklearn regressors.

Parameters:

regressor (regression object) – This should be an object similar to sklearn-like regressors that provides the methods fit(self, X_train, y_train, X_test, y_test) and predict(self, X). This must also provide the methods get_coefs(self), set_coefs(self, coefs), and average_coefs(self, list_of_coefs). An example of this is ForwardSelection below. The regressor can also have a member variable additional_results, which should be a dictionary of parameters that are calculated during fitting but not needed for predicting, for example metrics like the explained variance of predictors. In this case the regressor also needs the method average_additional_results(self, list_of_dicts) and set_additional_results(self, mean_additional_results).
cv (integer or cross-validation generator (optional, default: None)) –
This determines how the data are split:
- If cv=None, 3-fold cross-validation will be used.
- If cv=n where n is an integer, n-fold cross-validation will be used.
- If cv=some_object, where some_object implements a some_object.split(X, y) method that returns indices for training and test set, this will be used. It is recommended to use YearlyBootstrapper() from stat_downscaling_tools.bootstrap.

Variables:

mean_coefs (type and shape depends on regressor, (only after fitting)) – Fitted coefficients (mean of all models where the coefficients were nonzero).
cv_error (float (only after fitting)) – Mean of errors on test sets during bootstrapping loop.
coef_, (If the regressor object has the attributes intercept_ and) –
here. (these will also be set) –

fit(X, y)[source]

Fits a model in a bootstrapping loop.

Parameters:

X (pd.DataFrame) – DataFrame of predictors
y (pd.Series) – Series of predictand

fit_predict(X, y)[source]

predict(X)[source]

Predicts values from previously fitted coefficients.

If the input X is a pandas DataFrame or Series, a Series is returned, otherwise only a numpy array.

Parameters:: X (pd.DataFrame) –
Returns:: y
Return type:: pd.Series

class pyESD.MLR_model.ForwardSelection(regressor, min_explained_variance=0.02)[source]

Bases: MetaEstimator

Performs a forward selection regression.

This stepwise selects the next most promising candidate predictor and adds it to the model if it is good enough. The method is outlined in “Statistical Analysis in Climate Research” (von Storch, 1999).

Since this object is intended to be used in the BootstrappedRegression class, it implements all necessary methods.

Parameters:

regressor (regression object) – This should be an object similar to sklearn-like regressors that provides the methods fit and predict. Furthermore, it must also provide the methods get_coefs, set_coefs, average_coefs, and fit_active. An example of this is MultipleLSRegression below.
min_explained_variance (float, optional (default: 0.02)) – If inclusion of the staged predictor doesn’t improve the explained variance on the test set by at least this amount, stop the selection process.

Variables:

explaned_variances (numpy array) –

average_additional_results(list_of_params)[source]

average_coefs(list_of_coefs)[source]

fit(X_train, y_train, X_test, y_test)[source]

Cross-validated forward selection. This fits a regression model according to the following algorithm:

Start with yhat = mean(y), res = y - yhat, active = []
for each predictor in inactive set:
- add to active set
- perform regression
- get error and uncertainty of error (standard deviation)
- remove from active set
add predictor with lowest error on test set to active set
if improvement was not good enough, abort and use previous model.

Parameters:

X_train (numpy array of shape #samples x #predictors) – Array that holds the values of the predictors (columns) at different times (rows) for the training dataset.
y_train (numpy array of length #samples) – Training predictand data
X_test (numpy array of shape #samples x #predictors) – Test predictor data
y_test (numpy array of length #samples) – Test predictand data

Returns:

exp_var – explained variance of each predictor

Return type:

numpy array of length #predictors

fit_active(X, y, active)[source]: Fits using only the columns of X whose index is in active.

predict(X)[source]

predict_active(X, active)[source]

set_additional_results(add_results)[source]

class pyESD.MLR_model.MultipleLSRegression[source]

Bases: LinearCoefsHandlerMixin

Implementation of multiple linear OLS regression to be used with ForwardSelection and BootstrappedRegression. The following methods are implemented:

fit
predict
get_coefs
set_coefs
average_coefs
fit_active

fit(X, y)[source]

predict(X)[source]

set_expand_coefs(active, n_predictors)[source]: This will be called after fit, since fit will often be called with only some of the predictors. This expands the current coefficients and expands them in a way such that predict can be called with all predictors.

pyESD.MLR_model._get_active(X, active)[source]: Returns a new matrix X_active with only the columns of X whose index is in active.