Assigning a state to a record, randomly, based on probability of the record being from particular states (versus distributing portions of records to states)

**I am moving Dan's comments on this point to here. Here is his initial comment (https://github.com/open-source-economics/taxdata/issues/138#issuecomment-353685059).**

It is possible to assign a single state to each record in an unbiased manner. The way I have done this is to calculate a probability of a record being in each of the 50 states, and assign it to one of those state in proportion to those probabilities. That is, if a record has high state income tax the procedure will show high probabilities for New York, California, etc and low probabilities (but not zero) for Florida and Texas. Then the computer will select New York or California with high probability and Florida or Texas with low probability. In expectation the resulting totals will be the same as the "long" format but with some unbiased error. I have done this and find that state level aggregates match nearly as well as summing over all possible states. If desired, one could take 2 draws, or any other number. It would not be necessary to multiply the workload by 51.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assigning a state to a record, randomly, based on probability of the record being from particular states (versus distributing portions of records to states) #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Assigning a state to a record, randomly, based on probability of the record being from particular states (versus distributing portions of records to states) #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions