-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Today we discussed the sale review process with Valuations and we learned a lot about how they think about the various columns that we load into our sale.flag_override table. Here's a quick summary of what we learned:
is_flipdoesn't necessarily indicate whether or not the sale is representative of the market. Sometimes a flip can push a property into a state that is more representative of the market, e.g. rehabbing an old home and introducing high end finishes in a neighborhood where that is considered to be normal. As such,is_fliprequires some additional context in order to be useful in determining whether to exclude a sale from our training set.- The
"yes_major"value inhas_characteristic_changecan indicate either incorrect square footage or changes to interior condition that are not representative of the market, e.g. high end finishes in a neighborhood where that is unusual. If Valuations fills out this column consistently, then it will sometimes encode information about property condition that will never make its way into iasWorld in a way that would be useful to modeling (since we ignore property condition due to its sparsity). - Valuations uses
requires_field_checkandwork_drawerto signal next steps to Data Integrity, not to encode information about the sale or its characteristics. In theory, these two columns should not provide any information that is useful to us beyond what is already present inis_arms_length,class_change,characteristic_change, andis_flip.
As a reminder, the column default.vw_pin_sale.is_outlier combines information from our sales validation model with information from the sale review process in order to determine whether to exclude a sale from our modeling pipelines. Based on what we've learned, I think the is_outlier logic should use the following sequence of conditionals:
- If
has_class_changeis true orcharacteristic_changeis"yes_major",is_outliershould be true. - If
is_arms_lengthis false oris_flipis true andsv_is_outlieris true,is_outliershould be true. - If
is_arms_lengthis false oris_flipis true andsv_outlieris false,is_outliershould be false. - If
is_arms_lengthis false oris_flipis true andsv_outlieris null,is_outliershould be false (default to inclusion). - If all analyst review flags are not null but indicate no problems with the sale,
is_outliershould be false. - If all analyst review flags are null and
sv_is_outlieris not null,is_outliershould take the value ofsv_is_outlier. - If all analyst review flags are null and
sv_is_outlieris null,is_outliershould be null
Here's a matrix that visualizes these conditions, in case that's easier to understand. The column values follow this pattern:
- ✅ = True
- ❌ = False
- ⬜ = Null
- Empty cell: The value does not matter for the condition
| condition # | has class change | major char change | is arms length | is flip | SV outlier | is_outlier |
|---|---|---|---|---|---|---|
| 1 | ✅ | ✅ | ||||
| 1 | ✅ | ✅ | ||||
| 2 | ❌ | ✅ | ✅ | |||
| 3 | ❌ | ❌ | ❌ | |||
| 4 | ❌ | ⬜ | ❌ | |||
| 2 | ✅ | ✅ | ✅ | |||
| 3 | ✅ | ❌ | ❌ | |||
| 4 | ✅ | ⬜ | ❌ | |||
| 5 | ❌ | ❌ | ✅ | ❌ | ❌ | |
| 6 | ⬜ | ⬜ | ⬜ | ⬜ | ❌ | ❌ |
| 6 | ⬜ | ⬜ | ⬜ | ⬜ | ✅ | ✅ |
| 7 | ⬜ | ⬜ | ⬜ | ⬜ | ⬜ | ⬜ |
If we decide to move forward with these changes, I think we can fold them into #970 rather than open a separate PR. I just wanted to open a dedicated issue so that we could preserve this discussion in an easily searchable way.