Replace mode pipeline #892

aGuttman · 2022-12-23T20:40:29Z

No description provided.

Replacement model

Building out pipeline infrastructure to run replace mode

aGuttman · 2022-12-23T20:42:32Z

Start building out infrastructure to allow replace mode model to run in pipeline. Functions made to use gbdt model through trip_model interface, create storage methods for model.

aGuttman · 2022-12-23T20:44:22Z

todo:
continue building/testing infrustructure components
write save/update functions for model
create unit tests

shankari

At a high level, I also don't see this new algorithm called from anywhere in the pipeline.

shankari · 2022-12-23T20:50:59Z

emission/analysis/classification/inference/labels/inferrers.py

+def predict_gradient_boosted_decision_tree(trip, max_confidence=None, first_confidence=None, confidence_multiplier=None):
+    # load application config 
+    model_type = eamtc.get_model_type()


this seems like it is just a copy/paste of the previous predict_cluster_confidence_discounting
Why does this have to be in the labels directory anyway?
labels is for predicting labels based on other labels
replaced_mode is for predicting the replaced mode based on other characteristics (e.g. demographics).

So while it is appropriate to have this be inspired by the label assist algorithm, it is its own algorithm/model, and for clarity, it should be in its own directory. Its scaffolding can be similar to the label assist, but it is not a label assist.

shankari · 2022-12-23T20:59:28Z

emission/analysis/classification/inference/labels/inferrers.py

+    labels = copy.deepcopy(labels)
+    for l in labels: l["p"] *= confidence_coeff
+    return labels


concretely, this is also wrong because there will not be a label array or probabilities.
Note that this code as written does not work because confidence_coeff is not defined.

shankari · 2022-12-23T21:00:22Z

emission/analysis/classification/inference/labels/pipeline_replace_mode.py

+
+# Does all the work necessary for a given user
+def infer_labels(user_id):
+    time_query = epq.get_time_range_for_label_inference(user_id)


again, this is not the time range to query for because that will return the time range for the label inference algorithm. You are your own algorithm and you need your own time range

This will break the pipeline unless changed.

shankari · 2022-12-23T21:01:07Z

emission/analysis/classification/inference/labels/pipeline_replace_mode.py

+
+# Code structure based on emission.analysis.classification.inference.mode.pipeline
+# and emission.analysis.classification.inference.mode.rule_engine
+class LabelInferencePipeline:


again, this needs to change for clarity

shankari · 2022-12-23T21:01:54Z

emission/analysis/classification/inference/labels/pipeline_replace_mode.py

+            cleaned_trip_dict = copy.copy(cleaned_trip)["data"]
+            inferred_trip = ecwe.Entry.create_entry(user_id, "analysis/inferred_trip", cleaned_trip_dict)
+


you have basically copy-pasted the other pipeline.py, you need to understand how it works and adapt it to be a separate step.

shankari · 2022-12-23T21:02:42Z

emission/analysis/modelling/trip_model/run_model.py

@@ -118,6 +118,27 @@ def predict_labels_with_n(
        predictions, n = model.predict(trip)
        return predictions, n

+def predict_labels_with_gbdt(


where is this called from?

From the list of algorithms (but pipeline_replace_mode.py will do it later).

zackAemmer and others added 13 commits November 17, 2022 15:21

Blueprint for GBDT, work in progress

6dc4995

GBDT with all feature types

8a35ff0

Decent baseline GBDT

8fa921d

Add basic SVM

b99c499

Clean up some shared code

f3c7160

Make SVM incremental

a4cb4f3

Add gbdt to model types

10e5c74

Addressing comments, move classes to config

063d9ab

Incremental SVM, more testing, add classes to configs

2e18226

Demographic data format in replacement modeling

d194d2b

Partial switch to inferred labels for replacement mode, demographics

693b8e0

Merge pull request #1 from zackAemmer/replacement-model

1c06954

Replacement model

Integrating Replace Mode Model

9467356

Building out pipeline infrastructure to run replace mode

shankari requested changes Dec 23, 2022

View reviewed changes

shankari changed the base branch from random-forest-mode-detection to master September 23, 2023 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace mode pipeline #892

Replace mode pipeline #892

aGuttman commented Dec 23, 2022

aGuttman commented Dec 23, 2022

aGuttman commented Dec 23, 2022

shankari left a comment

shankari Dec 23, 2022

shankari Dec 23, 2022

shankari Dec 23, 2022

shankari Dec 23, 2022

shankari Dec 23, 2022

shankari Dec 23, 2022

shankari Dec 23, 2022

shankari Dec 23, 2022

		cleaned_trip_dict = copy.copy(cleaned_trip)["data"]
		inferred_trip = ecwe.Entry.create_entry(user_id, "analysis/inferred_trip", cleaned_trip_dict)

Replace mode pipeline #892

Are you sure you want to change the base?

Replace mode pipeline #892

Conversation

aGuttman commented Dec 23, 2022

aGuttman commented Dec 23, 2022

aGuttman commented Dec 23, 2022

shankari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment