I am moving Dan's comments on this point to here. Here is his initial comment (PSLmodels/taxdata#138 (comment)).
It is possible to assign a single state to each record in an unbiased manner. The way I have done this is to calculate a probability of a record being in each of the 50 states, and assign it to one of those state in proportion to those probabilities. That is, if a record has high state income tax the procedure will show high probabilities for New York, California, etc and low probabilities (but not zero) for Florida and Texas. Then the computer will select New York or California with high probability and Florida or Texas with low probability. In expectation the resulting totals will be the same as the "long" format but with some unbiased error. I have done this and find that state level aggregates match nearly as well as summing over all possible states. If desired, one could take 2 draws, or any other number. It would not be necessary to multiply the workload by 51.
I am moving Dan's comments on this point to here. Here is his initial comment (PSLmodels/taxdata#138 (comment)).
It is possible to assign a single state to each record in an unbiased manner. The way I have done this is to calculate a probability of a record being in each of the 50 states, and assign it to one of those state in proportion to those probabilities. That is, if a record has high state income tax the procedure will show high probabilities for New York, California, etc and low probabilities (but not zero) for Florida and Texas. Then the computer will select New York or California with high probability and Florida or Texas with low probability. In expectation the resulting totals will be the same as the "long" format but with some unbiased error. I have done this and find that state level aggregates match nearly as well as summing over all possible states. If desired, one could take 2 draws, or any other number. It would not be necessary to multiply the workload by 51.