-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace mode pipeline #892
base: master
Are you sure you want to change the base?
Conversation
Replacement model
Building out pipeline infrastructure to run replace mode
Addressing e-mission/e-mission-docs#841 Start building out infrastructure to allow replace mode model to run in pipeline. Functions made to use gbdt model through trip_model interface, create storage methods for model. |
todo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a high level, I also don't see this new algorithm called from anywhere in the pipeline.
def predict_gradient_boosted_decision_tree(trip, max_confidence=None, first_confidence=None, confidence_multiplier=None): | ||
# load application config | ||
model_type = eamtc.get_model_type() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like it is just a copy/paste of the previous predict_cluster_confidence_discounting
Why does this have to be in the labels
directory anyway?
labels
is for predicting labels based on other labels
replaced_mode
is for predicting the replaced mode based on other characteristics (e.g. demographics).
So while it is appropriate to have this be inspired by the label assist algorithm, it is its own algorithm/model, and for clarity, it should be in its own directory. Its scaffolding can be similar to the label assist, but it is not a label assist.
labels = copy.deepcopy(labels) | ||
for l in labels: l["p"] *= confidence_coeff | ||
return labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concretely, this is also wrong because there will not be a label array or probabilities.
Note that this code as written does not work because confidence_coeff
is not defined.
|
||
# Does all the work necessary for a given user | ||
def infer_labels(user_id): | ||
time_query = epq.get_time_range_for_label_inference(user_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, this is not the time range to query for because that will return the time range for the label inference algorithm. You are your own algorithm and you need your own time range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break the pipeline unless changed.
|
||
# Code structure based on emission.analysis.classification.inference.mode.pipeline | ||
# and emission.analysis.classification.inference.mode.rule_engine | ||
class LabelInferencePipeline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, this needs to change for clarity
cleaned_trip_dict = copy.copy(cleaned_trip)["data"] | ||
inferred_trip = ecwe.Entry.create_entry(user_id, "analysis/inferred_trip", cleaned_trip_dict) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have basically copy-pasted the other pipeline.py, you need to understand how it works and adapt it to be a separate step.
@@ -118,6 +118,27 @@ def predict_labels_with_n( | |||
predictions, n = model.predict(trip) | |||
return predictions, n | |||
|
|||
def predict_labels_with_gbdt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this called from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the list of algorithms (but pipeline_replace_mode.py
will do it later).
No description provided.