Add `outlier_reason` and `source_is_outlier` columns to `default.vw_pin_sale` #967

wagnerlmichael · 2026-01-13T17:38:34Z

This PR adds two columns:

outlier_reason - an array column that contains all of the analyst and algorithmic outlier reasons.
source_is_outlier - A column that indicates whether or not we used the analyst override or the fallback statistical model for the is_outlier column. This column should make analytics comparing the two types of outliers sources easier, and also make it simpler to integrate the manual override integration into the model outliers report.

Sample query for inspection:

SELECT * FROM "z_ci_966_update_defaultvw_pin_sale_with_a_holistic_outlier_reason_field_default"."vw_pin_sale"
where is_arms_length is not null limit 50;

wagnerlmichael · 2026-01-16T15:39:00Z

dbt/models/default/default.vw_pin_sale.sql

+            OR flag_override.has_class_change IS NOT NULL
+            OR flag_override.has_characteristic_change IS NOT NULL
+            OR flag_override.requires_field_check IS NOT NULL
+            THEN 'analyst'


In is_outlier and source_is_outlier all of the null checking strikes me as perhaps a little brittle? One idea is we could add a test to make sure that there is at least one non null field in the columns that drive is_outlier in sale.flag_override

[Praise] This is a really good point! I think tests are a good idea, although I also think we may be missing a layer of abstraction -- see my comment on the PR body for details. If we decide to move in the direction I suggest, we can talk about tests that would make sense for that data model.

jeancochrane

I think your point about the null-checking logic being kind of brittle here is a very good one, and to me indicates a smell that we might be missing a layer of abstraction. What do you think about either a CTE or a view that is dedicated solely to combining our sales val algorithm with our analyst overrides, such that we can reduce duplication of brittle null-checking logic?

I'm imagining a schema like this:

# Algorithm fields
sv_is_outlier
sv_is_ptax_outlier
sv_is_heuristic_outlier
sv_outlier_reason1
sv_outlier_reason2
sv_outlier_reason3
sv_run_id
sv_version

# Analyst fields
ovrd_is_outlier
ovrd_is_arms_length
ovrd_is_flip
ovrd_has_class_change
ovrd_has_characteristic_change
ovrd_requires_field_check
ovrd_work_drawer

# Combined fields
has_sv_flag
has_flag_override

# Maybe we include this field, or maybe we leave it up to `default.vw_pin_sale`
is_outlier

I think the key changes here would be:

Clearly prefixing the override fields, per your + Tim's feedback
Encapsulating the logic that determines whether a sale has been flagged and/or reviewed in the has_sv_flag and has_flag_override columns, which would then allow downstream columns like is_outlier, source_is_outlier, and outlier_reason to more clearly define their conditionals based on the presence or absence of certain types of flagging, not the presence or absence of specific null fields

This logic could either live as a CTE in default.vw_pin_sale or as its own dedicated view in the sale database (e.g. sale.vw_flag). I'm agnostic as to which direction to go, though I'll note that one advantage of a view is that we would have the option of defining tests on it that should theoretically run quite fast, because each test would have to scan a lot less data than any given test on default.vw_pin_sale. An advantage of a CTE, on the other hand, is that it's one less file a reader has to skip to when they're trying to understand the structure of default.vw_pin_sale.

Curious what you think about all this! I've kept my review pretty high-level because I think this will be a sizeable change if we decide to undertake it.

jeancochrane · 2026-01-16T18:00:13Z

dbt/models/default/default.vw_pin_sale.sql

+            OR flag_override.has_class_change IS NOT NULL
+            OR flag_override.has_characteristic_change IS NOT NULL
+            OR flag_override.requires_field_check IS NOT NULL
+            THEN 'analyst'


[Praise] This is a really good point! I think tests are a good idea, although I also think we may be missing a layer of abstraction -- see my comment on the PR body for details. If we decide to move in the direction I suggest, we can talk about tests that would make sense for that data model.

TimCookCountyDS · 2026-01-16T19:33:41Z

Quick question re:

# Algorithm fields
sv_is_outlier
sv_is_ptax_outlier
sv_is_heuristic_outlier
sv_outlier_reason1
sv_outlier_reason2
sv_outlier_reason3
sv_run_id
sv_version

# Analyst fields
ovrd_is_outlier
ovrd_is_arms_length
ovrd_is_flip
ovrd_has_class_change
ovrd_has_characteristic_change
ovrd_requires_field_check
ovrd_work_drawer

# Combined fields
has_sv_flag
has_flag_override

# Maybe we include this field, or maybe we leave it up to `default.vw_pin_sale`
is_outlier

In this case would the "combined final judgment (our human review + sv_algorithm flagged)" as to whether something is an outlier be reflected by the "is_outlier" column?

(Also, I would think a view rather than CTE might be a little more helpful - in terms of having this mapped out clearly + s, for the purposes of later comparing any other outlier-flagging approaches or analysis that we want to do- will defer to your guys judgment though.)

jeancochrane · 2026-01-16T19:48:51Z

In this case would the "combined final judgment (our human review + sv_algorithm flagged)" as to whether something is an outlier be reflected by the "is_outlier" column?

Yup! Though I'm uncertain whether it makes sense for that logic to live in this proposed view or to continue to live in default.vw_pin_sale. I lean toward default.vw_pin_sale for two reasons (though my opinion is very weakly held):

The determination would rely on the logic that has_sv_flag and has_flag_override encapsulates, so the view would require an additional subquery in order to aggregate those columns to up to is_outlier; this is technically feasible but makes the view slightly more complex
It strikes me as plausible that at some point we may want to change the final is_outlier logic so that it considers the state of data outside sale.flag and sale.flag_override. If we were to do this, then suddenly the final is_outlier logic would be distinct from the is_outlier logic in the view, which means we would have yet another *_is_outlier column to maintain with its own distinct meaning.

TimCookCountyDS · 2026-01-16T20:39:48Z

In this case would the "combined final judgment (our human review + sv_algorithm flagged)" as to whether something is an outlier be reflected by the "is_outlier" column?

Yup! Though I'm uncertain whether it makes sense for that logic to live in this proposed view or to continue to live in default.vw_pin_sale. I lean toward default.vw_pin_sale for two reasons (though my opinion is very weakly held):

The determination would rely on the logic that has_sv_flag and has_flag_override encapsulates, so the view would require an additional subquery in order to aggregate those columns to up to is_outlier; this is technically feasible but makes the view slightly more complex

It strikes me as plausible that at some point we may want to change the final is_outlier logic so that it considers the state of data outside sale.flag and sale.flag_override. If we were to do this, then suddenly the final is_outlier logic would be distinct from the is_outlier logic in the view, which means we would have yet another *_is_outlier column to maintain with its own distinct meaning.

These both make sense to me- especially point number 2, with regard to future scaling/experiments.

wagnerlmichael · 2026-01-16T21:25:04Z

Thanks all for all of the thoughts. I'm gonna move forward with the view!

jeancochrane · 2026-02-03T19:46:40Z

@wagnerlmichael I think this PR is superceded by #977, right?

wagnerlmichael · 2026-02-03T19:52:56Z

@wagnerlmichael I think this PR is superceded by #977, right?

Yep, I think this one can be closed out

First pass

988642b

wagnerlmichael linked an issue Jan 13, 2026 that may be closed by this pull request

Update default.vw_pin_sale with a holistic outlier_reason field #966

Open

wagnerlmichael added 8 commits January 13, 2026 22:17

Edit comments

13f850e

Add prefix and remove snake case

d66b74e

Make arms length negative

2b1362b

Place analyst reasons before sv reasons

aae4e6d

Comments and linting

db05fa8

Add source_is_outlier

9d16c80

Add docs

408d606

Adjust comment

eaf6287

wagnerlmichael changed the title ~~WIP~~ Add outlier_reason and source_is_outlier columns to default.vw_pin_sale Jan 15, 2026

wagnerlmichael commented Jan 16, 2026

View reviewed changes

Tweak comment

530e5c0

wagnerlmichael marked this pull request as ready for review January 16, 2026 16:56

wagnerlmichael requested a review from a team as a code owner January 16, 2026 16:56

jeancochrane reviewed Jan 16, 2026

View reviewed changes

wagnerlmichael mentioned this pull request Jan 21, 2026

Tweak is_outlier logic in default.vw_pin_sale based on new understanding of sale review #974

Closed

jeancochrane mentioned this pull request Jan 23, 2026

Refactor outlier columns in default.vw_pin_sale to incorporate human review #977

Merged

jeancochrane closed this Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `outlier_reason` and `source_is_outlier` columns to `default.vw_pin_sale` #967

Add `outlier_reason` and `source_is_outlier` columns to `default.vw_pin_sale` #967

Uh oh!

wagnerlmichael commented Jan 13, 2026 •

edited

Loading

Uh oh!

wagnerlmichael Jan 16, 2026

Uh oh!

jeancochrane Jan 16, 2026

Uh oh!

jeancochrane left a comment •

edited

Loading

Uh oh!

jeancochrane Jan 16, 2026

Uh oh!

TimCookCountyDS commented Jan 16, 2026

Uh oh!

jeancochrane commented Jan 16, 2026

Uh oh!

TimCookCountyDS commented Jan 16, 2026

Uh oh!

wagnerlmichael commented Jan 16, 2026

Uh oh!

jeancochrane commented Feb 3, 2026

Uh oh!

wagnerlmichael commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add outlier_reason and source_is_outlier columns to default.vw_pin_sale #967

Add outlier_reason and source_is_outlier columns to default.vw_pin_sale #967

Uh oh!

Conversation

wagnerlmichael commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wagnerlmichael Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

jeancochrane left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeancochrane Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

TimCookCountyDS commented Jan 16, 2026

Uh oh!

jeancochrane commented Jan 16, 2026

Uh oh!

TimCookCountyDS commented Jan 16, 2026

Uh oh!

wagnerlmichael commented Jan 16, 2026

Uh oh!

jeancochrane commented Feb 3, 2026

Uh oh!

wagnerlmichael commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add `outlier_reason` and `source_is_outlier` columns to `default.vw_pin_sale` #967

Add `outlier_reason` and `source_is_outlier` columns to `default.vw_pin_sale` #967

wagnerlmichael commented Jan 13, 2026 •

edited

Loading

jeancochrane left a comment •

edited

Loading