Skip to content
Ken Haase edited this page Oct 24, 2024 · 2 revisions

Welcome to the S-DHC Release wiki!

This repository contains source code for the PHSafe disclosure avoidance application. PHSafe was used by the Census Bureau for the protection of individual 2020 Census responses in the tabulation and publication of the Scalable Demographic and Housing Characteristics File (S-DHC)

PH-SAFE combines 2020 Census response information for households and the individuals within them, infusing those statistics with statistical noise to create privacy-protected tabulations.

Because information about very large households can be highly disclosive, households above a certain size are truncated, removing members above the threshold.

The resulting truncated data is used to generate a preliminary tabulation of counts and ratios for characteristics (sex, race) of household occupants. Noise is then infused into the innermost detail cells of the preliminary tables to generate the final output of the PH-SAFE algorithm.

The resulting protected table is then statistically post-processed to improve accuracy (removing certain illogical results from the noise-infusion, such as negative counts or ratios with 0 as the denominator) and to produce credible intervals for the resulting statistics.

The PH-SAFE code itself can be found in the phsafe directory of this repository. PH-SAFE was built on Tumult's "Analytics" and "Core" platforms, whose source is found in the tumult subdirectory and makes use of customized CEF (Census Edited File) readers implemented by MITRE and included in the mitre subdirectory. All of these components are implemented in Python and the latest version of the platforms can be found at https://tmlt.dev/. The post-processing code can be found in the SDHC_Model_Based_Estimates subdirectory and is written in R.

Clone this wiki locally