Skip to content

Assigning a state to a record, randomly, based on probability of the record being from particular states (versus distributing portions of records to states) #2

@donboyd5

Description

@donboyd5

I am moving Dan's comments on this point to here. Here is his initial comment (PSLmodels/taxdata#138 (comment)).

It is possible to assign a single state to each record in an unbiased manner. The way I have done this is to calculate a probability of a record being in each of the 50 states, and assign it to one of those state in proportion to those probabilities. That is, if a record has high state income tax the procedure will show high probabilities for New York, California, etc and low probabilities (but not zero) for Florida and Texas. Then the computer will select New York or California with high probability and Florida or Texas with low probability. In expectation the resulting totals will be the same as the "long" format but with some unbiased error. I have done this and find that state level aggregates match nearly as well as summing over all possible states. If desired, one could take 2 draws, or any other number. It would not be necessary to multiply the workload by 51.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions