Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 1.69 KB

README.md

File metadata and controls

35 lines (28 loc) · 1.69 KB

census-parquet

Python tools for creating and maintaining Parquet files from US 2020 Census Data.

Installation

To use the data download shell script files first install wget.

To install the census-parquet package use

pip install census-parquet

This will also install the required Python dependencies which are:

  1. click
  2. dask
  3. dask_geopandas
  4. geopandas
  5. numpy
  6. openpyxl
  7. pandas
  8. pyarrow

Usage

To run the census-parquet code simply use

run_census_parquet

This runs the following scripts in order:

  1. download_boundaries.sh - This script downloads the Census Boundary data needed to run process_boundaries.py
  2. download_population_stats.sh - This script downloads population stat data needed for process_blocks.py
  3. download_blocks.sh - This script downloads the Census Block data needed to run process_blocks.py
  4. process_boundaries.py - This script processes the Census Boundary data and creates parquet files. The parquet files will be output into a boundary_outputs folder.
  5. process_blocks.py - This script processes Census Block data and creates parquet files. The final combined parquet file will have the name tl_2020_FULL_tabblock20.parquet.