Skip to content

Tweak is_outlier logic in default.vw_pin_sale based on new understanding of sale review #974

@jeancochrane

Description

@jeancochrane

Today we discussed the sale review process with Valuations and we learned a lot about how they think about the various columns that we load into our sale.flag_override table. Here's a quick summary of what we learned:

  • is_flip doesn't necessarily indicate whether or not the sale is representative of the market. Sometimes a flip can push a property into a state that is more representative of the market, e.g. rehabbing an old home and introducing high end finishes in a neighborhood where that is considered to be normal. As such, is_flip requires some additional context in order to be useful in determining whether to exclude a sale from our training set.
  • The "yes_major" value in has_characteristic_change can indicate either incorrect square footage or changes to interior condition that are not representative of the market, e.g. high end finishes in a neighborhood where that is unusual. If Valuations fills out this column consistently, then it will sometimes encode information about property condition that will never make its way into iasWorld in a way that would be useful to modeling (since we ignore property condition due to its sparsity).
  • Valuations uses requires_field_check and work_drawer to signal next steps to Data Integrity, not to encode information about the sale or its characteristics. In theory, these two columns should not provide any information that is useful to us beyond what is already present in is_arms_length, class_change, characteristic_change, and is_flip.

As a reminder, the column default.vw_pin_sale.is_outlier combines information from our sales validation model with information from the sale review process in order to determine whether to exclude a sale from our modeling pipelines. Based on what we've learned, I think the is_outlier logic should use the following sequence of conditionals:

  1. If has_class_change is true or characteristic_change is "yes_major", is_outlier should be true.
  2. If is_arms_length is false or is_flip is true and sv_is_outlier is true, is_outlier should be true.
  3. If is_arms_length is false or is_flip is true and sv_outlier is false, is_outlier should be false.
  4. If is_arms_length is false or is_flip is true and sv_outlier is null, is_outlier should be false (default to inclusion).
  5. If all analyst review flags are not null but indicate no problems with the sale, is_outlier should be false.
  6. If all analyst review flags are null and sv_is_outlier is not null, is_outlier should take the value of sv_is_outlier.
  7. If all analyst review flags are null and sv_is_outlier is null, is_outlier should be null

Here's a matrix that visualizes these conditions, in case that's easier to understand. The column values follow this pattern:

  • ✅ = True
  • ❌ = False
  • ⬜ = Null
  • Empty cell: The value does not matter for the condition
condition # has class change major char change is arms length is flip SV outlier is_outlier
1
1
2
3
4
2
3
4
5
6
6
7

If we decide to move forward with these changes, I think we can fold them into #970 rather than open a separate PR. I just wanted to open a dedicated issue so that we could preserve this discussion in an easily searchable way.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions