Refactor outlier columns in `default.vw_pin_sale` to incorporate human review #977

jeancochrane · 2026-01-23T20:57:13Z

Overview

This PR builds off of #967, suggesting a refactored data model that factors out the logic that combines our algorithmic sale flags with information from our human reviewers in order to produce a final outlier determination with clear reasons.

I went a little bit further with the data model than I expected -- I hope it's not too confusing! Though big refactors can be risky, this one feels appropriate because this data model is still very new. I'd be happy to walk through the changes on a call if it's too much to review from scratch.

See ccao-data/model-res-avm#423 for the corresponding res model PR.

Data model changes

New models that this PR introduces:

sale.vw_flag: View that pulls the most recent version of each algorithmically flagged sale from sale.flag
sale.vw_outlier: View that combines algorithmic sale flags with human-reviewed sale attributes in order to produce a final outlier determination and corresponding reasons

Changes to existing models:

Renamed sale.flag_override to sale.flag_review for clarity (we are not directly using the information to "override" sales val flags, we are just incorporating that information into our final decision, so "review" seems more neutral)
Added new column default.vw_pin_sale.is_outlier reflecting the final decision on outlier status based on sale.vw_outlier (which pulls from sale.flag and sale.flag_review)
Added new column default.vw_pin_sale.outlier_reason reflecting a human-readable string with the reason behind the sale's outlier status (also pulled from sale.vw_outlier)
Renamed audit trail columns in default.vw_pin_sale that come directly from sale.flag and sale.flag_review to use the prefixes flag_* and review_*, to make the provenance of each column more obvious, e.g. is_arms_length ➡️ review_is_arms_length
- Thanks to Michael and Tim for this idea!

Open questions

Should we update default.vw_pin_sale_combined? I haven't done so yet because I'm not sure of the status of it. Are we still using it? I almost wonder if we should remove that view to reduce complexity, now that we are getting new sales on a more regular schedule.

…e` to `sale.vw_outlier`

…ed sale

…PR workflow

…st_dbt` PR workflow" This reverts commit 7b45684.

dbt/models/default/schema/default.vw_pin_sale.yml

dbt/models/default/default.vw_pin_sale.sql

dbt/models/sale/schema.yml

wagnerlmichael

This looks great. I left a few comments and questions. Many of them are me thinking out loud about outlier_reason.

Regarding the DBT unit tests, I saw you linked the docs, but I still think it would be useful to connect out of band to do a bit of knowledge sharing.

Regarding the default.vw_pin_sale_combined, I think that is a good question. I'm curious to see what Billy and Nicole would think. I'm also not exactly sure what conversations have been like about the regular sale ingest cadence.

I took a look at how much is_outlier disagrees with flag_is_outlier for sales where has_review = True:

SELECT
    is_outlier,
    flag_is_outlier,
    COUNT(*) AS row_count
FROM "z_ci_jeancochrane_fixup_is_outlier_sale"."vw_outlier"
WHERE has_review = true
GROUP BY
    is_outlier,
    flag_is_outlier
ORDER BY
    is_outlier,
    flag_is_outlier;

There is some disagreement (10% ish), but so far on the total number of outliers is almost unchanged

dbt/models/sale/docs.md

dbt/models/sale/sale.vw_flag.sql

dbt/models/default/schema/default.vw_pin_sale.yml

dbt/models/sale/sale.vw_outlier.sql

dbt/models/sale/schema.yml

wagnerlmichael · 2026-01-26T20:09:35Z

dbt/models/sale/sale.vw_outlier.sql

+                    -- for the market, or if a non-arm's-length sale is close
+                    -- to market price, then the information from that sale is
+                    -- still useful for our valuation models
+                    WHEN has_flag AND flag_is_outlier
+                        THEN
+                        CASE
+                            WHEN review_is_flip
+                                THEN
+                                'Review: Flip'
+                            WHEN NOT review_is_arms_length
+                                THEN
+                                'Review: Non-Arms-Length'
+                            ELSE


[Question]: Could it make sense to switch these to something like "Review: Flip, Algorithm: $algorithm_reason" ? Since those are the operative conditions that produce the outlier?

Perhaps that would be confusing for a down stream data user?

Ah, that's an interesting idea, I like it. Since the only really relevant part of the algorithmic process (currently) is price, maybe we do something like Review: Flip, Algorithm: High Price? Or something like Review + Algorithm: High Price Flip?

I've spent maybe a little too much time trying to think about which one of these is better and I'm still pretty split. I think the second one is more concise and tells a bit more of a story.

The first one however is a bit more modular, and perhaps in the future its structure lends itself to more easily incorporate classifications that depend on a mix of both review and algorithm reasons. I'm good with either one!

The first one however is a bit more modular, and perhaps in the future its structure lends itself to more easily incorporate classifications that depend on a mix of both review and algorithm reasons.

Yeah, I was thinking that too! I think modularity is more important than concision in this case, so I'll move forward with that.

Done in b6562bc.

…sons

jeancochrane · 2026-01-27T23:35:50Z

@wagnerlmichael I took a stab at another refactor in 4b0f4fe to reorient the data model for better support for archiving flag/review state in the res and condo models (see ccao-data/model-res-avm#423). Key changes include:

Switching back to flag_outlier_reason{N} from the proposed array field flag_outlier_reasons
- My reasoning here is that downstream consumers are already using the *_outlier_reason{N} schema, and while it doesn't feel ideal to me, there's not any need to change it right now when we might as well keep it as-is and preserve backwards-compatibility -- however, if you disagree and you feel strongly that we should switch to an array structure, let me know and we can reconsider
Adding a new review_json field that is a JSON object storing the raw state of the review findings
Fixing docs to reflect these changes

Take a look and let me know what you think!

wagnerlmichael

Looks great, awesome work on this new view! I think it is going to be nice to work with. One small confirmation where a comment might be helpful, but I don't feel particularly strongly about it.

wagnerlmichael · 2026-01-28T15:18:22Z

dbt/models/sale/sale.vw_outlier.sql

+            OR outlier_reason LIKE 'Review: Non-Arms-Length%'
+            OR outlier_reason LIKE 'Review: Flip%'


Although a "Non-arms-length" value doesn't necessarily lead to an outlier, this string match works because outlier_reason only contains Review: Non-Arms-Length if it was properly paired with the price outlier and therefore determined to be an outlier. Is that right?

That's right! It felt a little bit risky to document this reasoning at this point in the query, since I can easily imagine us tweaking the logic up above in the outlier_reason CTE, e.g. to decide to expand the types of algorithmic flags that would determine an outlier for a flip/non-arms-length sale, while forgetting to update the explanatory comment down here. As a compromise, I beefed up the comment in this section to point readers to the comments on the outlier_reason CTE for details on why each reason is or is not an outlier in af0dcfc.

…ult.vw_pin_sale`

…son` values

jeancochrane · 2026-01-28T17:56:03Z

@wagnerlmichael This should be ready for a final round of review! My commits today starting with e0da4d9 are exclusively dedicated to cleaning up docs to reflect the final data model.

wagnerlmichael · 2026-01-28T18:43:54Z

@wagnerlmichael This should be ready for a final round of review! My commits today starting with e0da4d9 are exclusively dedicated to cleaning up docs to reflect the final data model.

This looks good to me! Thanks for beefing up the docs.

…default.vw_pin_sale`

jeancochrane · 2026-02-02T23:04:11Z

I forgot about the Core Team review requirement here, so we'll need @wrridgeway to take a quick look at this before we merge to make sure we didn't do anything obviously bad.

wrridgeway

From my end everything looks pretty solid here, I can't find anything that sticks out as wrong. I'm assuming all the unit tests for sale.vw_outlier would catch any unexpected results produced by the conditional logic.

…and simplify (#427) This PR reworks the performance report to work with out [new sales val data model additions](ccao-data/data-architecture#977) and add two things to the report: - outlier numbers (raw and proportion) per year - outlier proportion maps per nbhd incorporated from our geography group testing It also reorganizes the outlier reasons that are displayed --------- Co-authored-by: Jean Cochrane <jean@jeancochrane.com> Co-authored-by: Jean Cochrane <jeancochrane@users.noreply.github.com>

wrridgeway · 2026-02-05T16:28:10Z

dbt/models/reporting/reporting.vw_market_tracker.sql

@@ -66,14 +66,14 @@ SELECT
    vps.sale_price,
    vps.sale_date,
    vps.sale_filter_is_outlier,


@jeancochrane should this become is_outlier?

sale_filter_is_outlier is now just an alias for is_outlier, in order to support backwards-compatibility for downstream consumers that are still relying on sale_filter_is_outlier (like the market tracker, I believe). It would probably be helpful in the long term to switch downstream consumers to is_outlier and remove this legacy field, but I don't think it's an urgent task.

data-architecture/dbt/models/default/default.vw_pin_sale.sql

Line 266 in c64b2af

COALESCE(outlier.is_outlier, FALSE) AS sale_filter_is_outlier,

data-architecture/dbt/models/default/default.vw_pin_sale.sql

Line 315 in c64b2af

COALESCE(outlier.is_outlier, FALSE) AS is_outlier,

wagnerlmichael and others added 20 commits January 13, 2026 17:38

First pass

988642b

Edit comments

13f850e

Add prefix and remove snake case

d66b74e

Make arms length negative

2b1362b

Place analyst reasons before sv reasons

aae4e6d

Comments and linting

db05fa8

Add source_is_outlier

9d16c80

Add docs

408d606

Adjust comment

eaf6287

Tweak comment

530e5c0

Refactor default.vw_pin_sale.is_outlier and outlier_reason logic

634097c

Move is_outlier and outlier_reason logic from `default.vw_pin_sal…

36e49d1

…e` to `sale.vw_outlier`

Define sale.vw_flag view to record most recent flag for each review…

7d4ade5

…ed sale

Temporarily disable main branch restriction for build_and_test_dbt …

7b45684

…PR workflow

Rename sale.flag_override table to sale.flag_review

4957812

WIP document new sale views

2a7f67e

Document new outlier columns and views

8f3615b

Fix syntax errors in sale.vw_flag definition

9fe8905

Add data tests for sale.vw_outlier

e14b451

Fix a few more spots with is_outlier

cf1b1a3

jeancochrane mentioned this pull request Jan 24, 2026

Integrate new sale outlier_reason column into ingest and analysis ccao-data/model-res-avm#423

Closed

jeancochrane added 3 commits January 23, 2026 18:32

Fixup market tracker and vw_ias_salesval_upload with vw_pin_sale changes

27c75d6

Merge branch 'master' into jeancochrane/fixup-is-outlier

e9bbaf1

Revert "Temporarily disable main branch restriction for `build_and_te…

abb4c08

…st_dbt` PR workflow" This reverts commit 7b45684.

jeancochrane changed the base branch from 966-update-defaultvw_pin_sale-with-a-holistic-outlier_reason-field to master January 24, 2026 00:38

jeancochrane added 3 commits January 23, 2026 19:20

Avoid errors in reporting.vw_market_tracker

786e5b9

Fix a couple docs

d1d8371

Fix error with CARDINALITY call in vw_market_tracker

642737b

jeancochrane commented Jan 24, 2026

View reviewed changes

jeancochrane marked this pull request as ready for review January 24, 2026 02:19

jeancochrane requested review from TimCookCountyDS and wagnerlmichael January 24, 2026 02:19

jeancochrane added 2 commits January 26, 2026 12:11

Fix small logic bug and add unit tests to sale.vw_outlier

34e3b37

Fix unit test indentation for yamllint

e512f5a

jeancochrane commented Jan 26, 2026

View reviewed changes

dbt/models/sale/schema.yml Show resolved Hide resolved

wagnerlmichael approved these changes Jan 26, 2026

View reviewed changes

jeancochrane added 2 commits January 27, 2026 15:28

Add algorithm flags to flip/non-arms-length reviewed sale outlier rea…

b6562bc

…sons

Refactor outlier reasons for more intuitive use in downstream consumers

4b0f4fe

jeancochrane requested a review from wagnerlmichael January 27, 2026 23:36

wagnerlmichael approved these changes Jan 28, 2026

View reviewed changes

jeancochrane added 4 commits January 28, 2026 10:36

Factor out review_json docs to shared_columns and add it to `defa…

e0da4d9

…ult.vw_pin_sale`

Clarify comment on is_outlier logic in sale.vw_outlier

af0dcfc

Document price outlier component of flip/non-arms-length `outlier_rea…

79dd34d

…son` values

Document sv_* columns on model.training_data, for extra clarity

4608468

wagnerlmichael mentioned this pull request Jan 30, 2026

Rework outlier section of performance report to accommodate new spec and simplify ccao-data/model-res-avm#427

Merged

Ensure has_flag, has_review, and is_outlier are never null in `…

220bcbd

…default.vw_pin_sale`

jeancochrane requested a review from wrridgeway February 2, 2026 23:03

jeancochrane mentioned this pull request Feb 3, 2026

Add outlier_reason and source_is_outlier columns to default.vw_pin_sale #967

Closed

wrridgeway approved these changes Feb 3, 2026

View reviewed changes

jeancochrane merged commit ec33672 into master Feb 3, 2026
8 checks passed

jeancochrane deleted the jeancochrane/fixup-is-outlier branch February 3, 2026 21:12

wagnerlmichael mentioned this pull request Feb 4, 2026

Update outlier section of performance report for sales val changes ccao-data/model-condo-avm#131

Merged

jeancochrane mentioned this pull request Feb 4, 2026

Tweak is_outlier logic in default.vw_pin_sale based on new understanding of sale review #974

Closed

wrridgeway reviewed Feb 5, 2026

View reviewed changes

wrridgeway mentioned this pull request Feb 6, 2026

Housing Market Tracker: switch to is_outlier, add 2025 sales #984

Closed

		OR outlier_reason LIKE 'Review: Non-Arms-Length%'
		OR outlier_reason LIKE 'Review: Flip%'

Refactor outlier columns in default.vw_pin_sale to incorporate human review #977

Refactor outlier columns in default.vw_pin_sale to incorporate human review #977

Uh oh!

Conversation

jeancochrane commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Data model changes

Open questions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wagnerlmichael left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeancochrane commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wagnerlmichael left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeancochrane Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeancochrane commented Jan 28, 2026

Uh oh!

wagnerlmichael commented Jan 28, 2026

Uh oh!

jeancochrane commented Feb 2, 2026

Uh oh!

wrridgeway left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor outlier columns in `default.vw_pin_sale` to incorporate human review #977

Refactor outlier columns in `default.vw_pin_sale` to incorporate human review #977

jeancochrane commented Jan 23, 2026 •

edited

Loading

jeancochrane commented Jan 27, 2026 •

edited

Loading

jeancochrane Jan 28, 2026 •

edited

Loading

wrridgeway left a comment •

edited

Loading