Skip to content

Python tools for creating Parquet files from 2020 Census Data

License

Notifications You must be signed in to change notification settings

makepath/census-parquet

Repository files navigation

census-parquet

Python tools for creating and maintaining Parquet files from US 2020 Census Data.

Installation

To use the data download shell script files first install wget.

To install the census-parquet package use

pip install census-parquet

This will also install the required Python dependencies which are:

  1. click
  2. dask
  3. dask_geopandas
  4. geopandas
  5. numpy
  6. openpyxl
  7. pandas
  8. pyarrow

Usage

To run the census-parquet code simply use

run_census_parquet

This runs the following scripts in order:

  1. download_boundaries.sh - This script downloads the Census Boundary data needed to run process_boundaries.py
  2. download_population_stats.sh - This script downloads population stat data needed for process_blocks.py
  3. download_blocks.sh - This script downloads the Census Block data needed to run process_blocks.py
  4. process_boundaries.py - This script processes the Census Boundary data and creates parquet files. The parquet files will be output into a boundary_outputs folder.
  5. process_blocks.py - This script processes Census Block data and creates parquet files. The final combined parquet file will have the name tl_2020_FULL_tabblock20.parquet.