-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APPEND][FSTORE-1424] Feature logging for spark #1367
[APPEND][FSTORE-1424] Feature logging for spark #1367
Conversation
@@ -460,4 +460,4 @@ def delete_feature_logs( | |||
path_params += [self._TRANSFORMED_lOG] | |||
else: | |||
path_params += [self._UNTRANSFORMED_LOG] | |||
_client._send_request("DELETE", path_params, {}) | |||
return _client._send_request("DELETE", path_params, {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this returning? Normally delete methods return void since the corresponding item gets deleted.
def update(self, others): | ||
self._transformed_features = others._transformed_features | ||
self._untransformed_features = others._untransformed_features | ||
return self | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this method instead of the setter methods for the two fields? @transformed_features.setter
and @untransformed_features.setter
?
def _get_logging_fg(self, fv, transformed): | ||
feature_logging = self.get_feature_logging(fv) | ||
return self._get_logging_fg_feature_logging(feature_logging, transformed) | ||
|
||
def _get_logging_fg_feature_logging(self, feature_logging, transformed): | ||
if transformed: | ||
return feature_logging.transformed_features | ||
else: | ||
return feature_logging.untransformed_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these list are not easy to read.
_get_logging_fg_feature_logging
is returning features
that can be transformed or untransformed, but the method name indicates feature_logging
.
_get_logging_fg
retrieves the feature_logging
for a fv
and then call the previous method to return the (transformed or untransformed) features.
I wouldn't guess their behavior based on the method names :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, based on their used in log_features
, it seems that untransformed_features
and transformed_features
are actually fg
objects
td_col_name=FeatureViewEngine._LOG_TD_VERSION, | ||
time_col_name=FeatureViewEngine._LOG_TIME, | ||
model_col_name=FeatureViewEngine._HSML_MODEL, | ||
prediction=prediction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is prediction
one or multiple predictions? If many, can we use predictions
instead?
default_write_options = { | ||
"start_offline_materialization": False, | ||
} | ||
if write_options: | ||
default_write_options.update(write_options) | ||
fg = self._get_logging_fg(fv, transformed) | ||
fg = self._get_logging_fg_feature_logging(feature_logging, transformed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we better have a method called get_feature_group
in the feature_logging
class, that accepts a transformed
boolean parameter?
def get_feature_group(self, transformed):
return self._transformed_features if transformed else self._untransformed_features
Then:
- we don't need
_get_logging_fg_feature_logging
and_get_logging_fg
methods. We can just callget_feature_logging
from the engine and then the internal method if needed. - we don't need to pass
feature_logging
as a separate parameter. We can useget_feature_logging(fv)
.get_feature_group(transformed)
td_col_name, | ||
time_col_name, | ||
model_col_name, | ||
def get_feature_logging_df(features, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add types to all parameters?
def get_feature_logging_df(self, features, | ||
fg=None, | ||
fg_features: List[TrainingDatasetFeature]=None, | ||
td_predictions: List[TrainingDatasetFeature]=None, | ||
td_col_name=None, | ||
time_col_name=None, | ||
model_col_name=None, | ||
prediction=None, | ||
training_dataset_version=None, | ||
hsml_model=None, | ||
**kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add types to these parameters?
df = df.withColumn(td_col_name, | ||
lit(training_dataset_version).cast(LongType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation seems off here.
df = df.withColumn(time_col_name, | ||
lit(now).cast(TimestampType())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation here as well
@@ -3640,7 +3643,11 @@ def delete_log(self, transformed: Optional[bool]=None): | |||
# Raises | |||
`hsfs.client.exceptions.RestAPIError` in case the backend fails to delete the log. | |||
""" | |||
return self._feature_view_engine.delete_feature_logs(self, transformed) | |||
if self._feature_logging is None: | |||
self._feature_logging = self._feature_view_engine.get_feature_logging(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if delete_log
is called in a new FV without feature_logging enabled? (no call to enable_feature_logging()
or log()
) ?
I guess this line will return None, and then fail in the delete_feature_logs()
?
@javierdlrm will close this PR and made a new one to the hopswork-api repo |
This PR adds/fixes/changes...
JIRA Issue: -
Priority for Review: -
Related PRs: -
How Has This Been Tested?
Checklist For The Assigned Reviewer: