Skip to content

Conversation

@wagnerlmichael
Copy link
Member

@wagnerlmichael wagnerlmichael commented Jan 28, 2026

This PR reworks the performance report to work with out new sales val data model additions and add two things to the report:

  • outlier numbers (raw and proportion) per year
  • outlier proportion maps per nbhd incorporated from our geography group testing

It also reorganizes the outlier reasons that are displayed


```{r}
make_triad_map("South")
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be dynamically set such that the first triad is the triad in question for a given modeling year, but dynamically setting tabsets can be annoying, and I didn't see this as a super urgent task. What do you think of me putting in a TODO:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] I think it's fine to hardcode it like you've done here! Yeah, we'll have to scroll down for other tris, but that doesn't seem like too big a deal.

) %>%
# This data isn't used in the training data and as such we remove it for
# further analysis
filter(!ind_pin_is_multicard)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an outlier perspective, I think we should filter these out, since they aren't included in the training data for the model

@wagnerlmichael
Copy link
Member Author

Looking at the performance report that was generated off of this branch (2026-01-30-nervous-tristan), in the topline stats it reads "Contained NA outliers (NA of the total sales)". But we discussed the source of this out of band, and I believe @jeancochrane will implement a fix in the upstream data models

@wagnerlmichael wagnerlmichael marked this pull request as ready for review January 30, 2026 21:17
Copy link
Member

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!


```{r}
make_triad_map("South")
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Praise] I think it's fine to hardcode it like you've done here! Yeah, we'll have to scroll down for other tris, but that doesn't seem like too big a deal.

stages:
ingest:
cmd: Rscript pipeline/00-ingest.R
deps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nitpick, required] There's a lot of concurrent action going on with this lockfile right now, and I don't think we actually want to persist all of these changes, so I would recommend switching this back to the state of the lockfile on the main branch and we'll update it in a dedicated PR if we need to.

@jeancochrane
Copy link
Member

Also, remember to take the [WIP] out of the PR title before you merge!

@wagnerlmichael wagnerlmichael changed the title [WIP] Rework outlier section of performance report to accommodate new spec and simplify Rework outlier section of performance report to accommodate new spec and simplify Jan 30, 2026
Copy link
Member

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done, I'm just kinda curious what's going on with those regexes in the outlier reason parsing conditions? They make the conditions hard to read, so we should strip them out if possible.

In the meantime, I think this is basically ready to go, so I'm going to go ahead and merge my data architecture PR, and you should flip the default.vw_pin_sale reference back to the prod table.

# Review Non arms length + algorithm price
str_detect(
sv_outlier_reason,
regex("Review:\\s*Non-Arms-Length", ignore_case = TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question, non-blocking] What's up with the whitespace regex here and in the rest of the rules below? IIRC Review: Non-Arms-Length should always be one string, no variable whitespace.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in excel parsing mode. I'll switch to an exact match.

Copy link
Member

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is basically ready, pending the switch back to the prod table reference! One question below about whether we really need the inner regex() calls in our str_detect() conditions, but it doesn't really matter either way.

wagnerlmichael and others added 2 commits February 3, 2026 12:10
Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>
@jeancochrane jeancochrane merged commit a2d6ec6 into master Feb 3, 2026
5 checks passed
@jeancochrane jeancochrane deleted the rework-outlier-section-of-performance-report-to-simplify-and-accommodate-new-spec branch February 3, 2026 21:22
jeancochrane added a commit that referenced this pull request Feb 4, 2026
This PR reruns the ingest, pushes input data to DVC, and updates the
lockfile to point to it so that we can run models using the new reviewed
sales that we added in #427.

I tested this by running a model in Batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants