-
Notifications
You must be signed in to change notification settings - Fork 15
Rework outlier section of performance report to accommodate new spec and simplify #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework outlier section of performance report to accommodate new spec and simplify #427
Conversation
…rt-to-simplify-and-accommodate-new-spec Merge in params change
|
|
||
| ```{r} | ||
| make_triad_map("South") | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be dynamically set such that the first triad is the triad in question for a given modeling year, but dynamically setting tabsets can be annoying, and I didn't see this as a super urgent task. What do you think of me putting in a TODO:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Praise] I think it's fine to hardcode it like you've done here! Yeah, we'll have to scroll down for other tris, but that doesn't seem like too big a deal.
| ) %>% | ||
| # This data isn't used in the training data and as such we remove it for | ||
| # further analysis | ||
| filter(!ind_pin_is_multicard) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an outlier perspective, I think we should filter these out, since they aren't included in the training data for the model
|
Looking at the performance report that was generated off of this branch ( |
jeancochrane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
|
|
||
| ```{r} | ||
| make_triad_map("South") | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Praise] I think it's fine to hardcode it like you've done here! Yeah, we'll have to scroll down for other tris, but that doesn't seem like too big a deal.
| stages: | ||
| ingest: | ||
| cmd: Rscript pipeline/00-ingest.R | ||
| deps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nitpick, required] There's a lot of concurrent action going on with this lockfile right now, and I don't think we actually want to persist all of these changes, so I would recommend switching this back to the state of the lockfile on the main branch and we'll update it in a dedicated PR if we need to.
|
Also, remember to take the |
jeancochrane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost done, I'm just kinda curious what's going on with those regexes in the outlier reason parsing conditions? They make the conditions hard to read, so we should strip them out if possible.
In the meantime, I think this is basically ready to go, so I'm going to go ahead and merge my data architecture PR, and you should flip the default.vw_pin_sale reference back to the prod table.
reports/performance/_outliers.qmd
Outdated
| # Review Non arms length + algorithm price | ||
| str_detect( | ||
| sv_outlier_reason, | ||
| regex("Review:\\s*Non-Arms-Length", ignore_case = TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Question, non-blocking] What's up with the whitespace regex here and in the rest of the rules below? IIRC Review: Non-Arms-Length should always be one string, no variable whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in excel parsing mode. I'll switch to an exact match.
jeancochrane
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is basically ready, pending the switch back to the prod table reference! One question below about whether we really need the inner regex() calls in our str_detect() conditions, but it doesn't really matter either way.
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
…rt-to-simplify-and-accommodate-new-spec
This PR reruns the ingest, pushes input data to DVC, and updates the lockfile to point to it so that we can run models using the new reviewed sales that we added in #427. I tested this by running a model in Batch.
This PR reworks the performance report to work with out new sales val data model additions and add two things to the report:
It also reorganizes the outlier reasons that are displayed