Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APPEND][FSTORE-1424] Feature logging for spark #1367

Closed

Conversation

kennethmhc
Copy link
Contributor

This PR adds/fixes/changes...

  • please summarize your changes to the code
  • and make sure to include all changes to user-facing APIs

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

@@ -460,4 +460,4 @@ def delete_feature_logs(
path_params += [self._TRANSFORMED_lOG]
else:
path_params += [self._UNTRANSFORMED_LOG]
_client._send_request("DELETE", path_params, {})
return _client._send_request("DELETE", path_params, {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this returning? Normally delete methods return void since the corresponding item gets deleted.

Comment on lines +29 to +33
def update(self, others):
self._transformed_features = others._transformed_features
self._untransformed_features = others._untransformed_features
return self

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this method instead of the setter methods for the two fields? @transformed_features.setter and @untransformed_features.setter?

Comment on lines 949 to 957
def _get_logging_fg(self, fv, transformed):
feature_logging = self.get_feature_logging(fv)
return self._get_logging_fg_feature_logging(feature_logging, transformed)

def _get_logging_fg_feature_logging(self, feature_logging, transformed):
if transformed:
return feature_logging.transformed_features
else:
return feature_logging.untransformed_features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these list are not easy to read.
_get_logging_fg_feature_logging is returning features that can be transformed or untransformed, but the method name indicates feature_logging.
_get_logging_fg retrieves the feature_logging for a fv and then call the previous method to return the (transformed or untransformed) features.

I wouldn't guess their behavior based on the method names :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, based on their used in log_features, it seems that untransformed_features and transformed_features are actually fg objects

td_col_name=FeatureViewEngine._LOG_TD_VERSION,
time_col_name=FeatureViewEngine._LOG_TIME,
model_col_name=FeatureViewEngine._HSML_MODEL,
prediction=prediction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is prediction one or multiple predictions? If many, can we use predictions instead?

default_write_options = {
"start_offline_materialization": False,
}
if write_options:
default_write_options.update(write_options)
fg = self._get_logging_fg(fv, transformed)
fg = self._get_logging_fg_feature_logging(feature_logging, transformed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we better have a method called get_feature_group in the feature_logging class, that accepts a transformed boolean parameter?

def get_feature_group(self, transformed):
    return self._transformed_features if transformed else self._untransformed_features

Then:

  • we don't need _get_logging_fg_feature_logging and _get_logging_fg methods. We can just call get_feature_logging from the engine and then the internal method if needed.
  • we don't need to pass feature_logging as a separate parameter. We can use get_feature_logging(fv).get_feature_group(transformed)

td_col_name,
time_col_name,
model_col_name,
def get_feature_logging_df(features,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add types to all parameters?

Comment on lines +1414 to +1424
def get_feature_logging_df(self, features,
fg=None,
fg_features: List[TrainingDatasetFeature]=None,
td_predictions: List[TrainingDatasetFeature]=None,
td_col_name=None,
time_col_name=None,
model_col_name=None,
prediction=None,
training_dataset_version=None,
hsml_model=None,
**kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add types to these parameters?

Comment on lines +1437 to +1438
df = df.withColumn(td_col_name,
lit(training_dataset_version).cast(LongType()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems off here.

Comment on lines +1445 to +1446
df = df.withColumn(time_col_name,
lit(now).cast(TimestampType()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation here as well

@@ -3640,7 +3643,11 @@ def delete_log(self, transformed: Optional[bool]=None):
# Raises
`hsfs.client.exceptions.RestAPIError` in case the backend fails to delete the log.
"""
return self._feature_view_engine.delete_feature_logs(self, transformed)
if self._feature_logging is None:
self._feature_logging = self._feature_view_engine.get_feature_logging(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if delete_log is called in a new FV without feature_logging enabled? (no call to enable_feature_logging() or log()) ?
I guess this line will return None, and then fail in the delete_feature_logs() ?

@kennethmhc
Copy link
Contributor Author

@javierdlrm will close this PR and made a new one to the hopswork-api repo

@kennethmhc kennethmhc closed this Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants