-
Notifications
You must be signed in to change notification settings - Fork 0
45 add interpolation methods for input output #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
that specifies the shift for each input individually
…olation-methods-for-input-output
…olation-methods-for-input-output
output wasn't considered, error occurred when using pinn
…olation-methods-for-input-output
…://github.com/RWTH-EBC/physXAI into 45-add-interpolation-methods-for-input-output
updated docstrings
| shift = { | ||
| 'reaTZon_y': 'previous', # for all lags of reaTZon_y, the shift will be set automatically | ||
| 'weaSta_reaWeaHDirNor_y': 'mean_over_interval', | ||
| '_default': 0, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include this as part of Feature with attribute sampling_method. Default should be None, and there should be a class attribute default_sampling_method that is used if None
Shift should be removed from preprocessing (not backwards compatible, but necessary to maintain usability)
| if (len(self.inputs) != len(self.shift.keys())) or not all(inp in self.shift.keys() for inp in self.inputs): | ||
| self.shift = convert_shift_to_dict(self.shift, self.inputs, custom_default=self.shift_default) | ||
|
|
||
| assert len(self.inputs) == len(self.shift.keys()), ( | ||
| f"Something went wrong, number of inputs ({len(self.inputs)})" | ||
| f" doesn't match number of inputs defined in shift ({len(self.shift.keys())})") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not already done in the init function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L291-292 are necessary in case of recursive feature selection since preprocessing object is only created once but inputs are changed
| FeatureConstruction.process(df) | ||
| # Only apply for those features that are not lags since lags must be constructed after sampling the data | ||
| # according to the given time step | ||
| FeatureConstruction.process(df, feature_names=inputs_without_lags + [out for out in self.output if out not in inputs_without_lags]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be easier to implement a process_without_lags and process_only_lags function, to avoid too much code in high level functions
| """ | ||
|
|
||
| def __init__(self, inputs: list[str], output: Union[str, list[str]], shift: int = 1, | ||
| def __init__(self, inputs: list[str], output: Union[str, list[str]], shift: Union[int, str, dict] = 'previous', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can probably be removed if sampling_method is part of feature
| os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0' | ||
|
|
||
|
|
||
| def convert_shift_to_dict(s: Union[int, str, dict], inputs: list[str], custom_default: Union[int, str] = None) -> dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function should (hopefully) get much easier if shift is a attribute of Feature
| if all('current' == self.shift[k] for k in inputs_without_lags): | ||
| # filter / sample data | ||
| X = self.filter_df_according_to_timestep(X) | ||
| # nothing more to do here | ||
| elif all('previous' == self.shift[k] for k in inputs_without_lags): | ||
| # filter / sample data | ||
| X = self.filter_df_according_to_timestep(X) | ||
|
|
||
| # shift data by 1 and shorten DataFrames accordingly | ||
| X = X.shift(1) | ||
| y = y.iloc[1:] | ||
| X = X.iloc[1:] | ||
| elif all('mean_over_interval' == self.shift[k] for k in inputs_without_lags): | ||
| X = get_mean_over_interval(y, X) | ||
| # synchronize length between X and y | ||
| y = y.iloc[1:] | ||
|
|
||
| else: # different inputs have different shifts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not a duplicate of the more general "else: # different inputs have different shifts" statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in fact, the else would do the same thing. The idea was that it might be faster in case all inputs have the same shift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, i think the preprocessing is not really a performance bottleneck, so i would sacrifice a little performance for a more general implementation
| if isinstance(shift, dict) and '_default' in shift.keys(): | ||
| self.shift_default = shift['_default'] | ||
| shift.__delitem__('_default') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should get easier if Feature has a default sampling_method (as class attribute, so it is changeable)
| df = df.loc[first_valid_index:last_valid_index] | ||
| if df.isnull().values.any(): | ||
|
|
||
| def get_mean_over_interval(y: pd.DataFrame, x: pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this similar to how the sampling is done in agentlib/agentlib-mpc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_mean_over_interval function is mostly copied from agentlib-mpc/utils/sampling.py. However, small adaptions were necessary due to differing data structures
| y = y.iloc[:-self.shift] | ||
| X = X.iloc[:-self.shift] | ||
| # Applies feature constructions defined in `FeatureConstruction` to the lagged inputs | ||
| FeatureConstruction.process(res_df, feature_names=lagged_inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it handeld if there is a constructed feature that is based on a lagged input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in commit 367624f
before: only str allowed
deleted deprecated code and test for shift conversion
sampling_method of constructed features determined based on corresponding base_features(s)
resetting FeatureConstruction.features also affected p_hp_data
…olation-methods-for-input-output
…://github.com/RWTH-EBC/physXAI into 45-add-interpolation-methods-for-input-output
…olation-methods-for-input-output
Pull Request
Description
Restructure and individualise shift and add interpolation methods
Type of Change
Required Checklist
Testing
Examples
Compatibility
pyproject.toml(required independenciesor optional in[project.optional-dependencies])Documentation
Optional
GitHub Copilot Review