Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new filters for ingesting sales #27

Closed
wrridgeway opened this issue Oct 25, 2023 · 4 comments
Closed

Add new filters for ingesting sales #27

wrridgeway opened this issue Oct 25, 2023 · 4 comments

Comments

@wrridgeway
Copy link
Member

wrridgeway commented Oct 25, 2023

default.vw_pin_sale will soon be unfiltered by default and we need to use certain conditions to make sure we don't ingest unwanted sales:

AND NOT sale.sale_filter_is_outlier
AND NOT sale.sale_filter_deed_type
AND NOT sale.sale_filter_less_than_10k
AND NOT sale.sale_filter_same_sale_within_365
@stevevance
Copy link

Hey, I'm curious about how you determine "arms length...

sale.sale_filter_is_outlier
What constitutes an outlier?

sale.sale_filter_deed_type
What deed types are you excluding?

sale.sale_filter_less_than_10k
Why did you pick $10,000? I've been using $101.

sale.sale_filter_same_sale_within_365
I get that two sales within 365 days is a clue but why is it a "hard" filtering rule?

@dfsnow
Copy link
Member

dfsnow commented Nov 2, 2023

Hi @stevevance!

"Arms-length" in this case is our best conservative estimate based on a variety of heuristics. Almost all of the work we've done on this front can be found in the ccao-data/model-sales-val repo. Note that this is a WIP effort and is by no means finalized.

sale.sale_filter_is_outlier
What constitutes an outlier?

This field is created by ccao-data/model-sales-val. It combines heuristics and statistical measures (N standard deviations from some group mean). The heuristics are things you'd probably expect (matching last name + extreme value, institutional buyer/seller + extreme value, certain PTAX-203 form answers, etc.).

sale.sale_filter_deed_type
What deed types are you excluding?

In this case, quit claims, executor deeds, and beneficiary deeds. You can see the exact SQL view definition if you're curious: https://github.com/ccao-data/data-architecture/blob/master/aws-athena/views/default-vw_pin_sale.sql#L120C16-L120C16

sale.sale_filter_less_than_10k
Why did you pick $10,000? I've been using $101.

These are the filters applied for residential modeling (so sales of improvement + land). We find that sales below $10K are rarely arms-length, but I'm happy to discuss further. Note that most of these are booted anyways by sale.sale_filter_is_outlier.

sale.sale_filter_same_sale_within_365
I get that two sales within 365 days is a clue but why is it a "hard" filtering rule?

This is somewhat confusingly named; it's actually detecting sales with the same price and PIN that have occurred within the prior year. Usually these are just data errors, hence the filtering. Note that we keep the earlier sale when we detect multiple with the same price.

@dfsnow
Copy link
Member

dfsnow commented Dec 5, 2023

@wrridgeway This is closed now that #61 is merged right?

@wrridgeway
Copy link
Member Author

yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants