Skip to content

USASpending Database

Peter B edited this page Sep 18, 2024 · 5 revisions

The USASpending Database is a public database of US Government contracts that is released once a month. The database is published as a PostgreSQL backup and contains dozens of different tables and views. The following steps are required to use the data:

  1. Download the data (PostgreSQL backup)
  2. Create a new database in which to load the backup
  3. Identify items to restore
  4. Restore the backup

Step 1: Download Data

The USASpending database is released once a month. The database is timestamped with the 8th of the month and released a few days later. For example, usaspending-db_20240908.zip is timestamped to September 8, 2024. The backup is huge and takes up to 8 hours to download. Here are the steps:

mkdir -p /data/downloads/spending/2024-09-08
cd /data/downloads/spending/2024-09-08
wget https://files.usaspending.gov/database_download/usaspending-db_20240908.zip
unzip usaspending-db_20240908.zip

Step 2: Create Database

In order to use the data (PostgreSQL backup), you must have a PostgreSQL database of your own. The following instructions assume that you have a PostgreSQL database installed and running locally.

Create Tablespace

This step is optional but recommended. Ideally, you want to create a folder on a relatively fast drive to store the database.

sudo mkdir -p /data/postgresql/usaspending
sudo chown postgres.postgres /data/postgresql/usaspending
sudo -i -u postgres psql -c "CREATE TABLESPACE usaspending LOCATION '/data/postgresql/usaspending';"

Create Roles

This step is required in order to restore most of the tables and views in the database.

sudo -i -u postgres psql -c "CREATE ROLE api_user WITH CREATEDB CREATEROLE;"
sudo -i -u postgres psql -c "CREATE ROLE root WITH CREATEDB CREATEROLE;"
sudo -i -u postgres psql -c "CREATE ROLE etl_user WITH CREATEDB CREATEROLE;"

Create database

There are a number of different ways to create a database in PostgreSQL. This is one example using the command line:

sudo -i -u postgres psql -c "create database usaspending;"

Here's another example using the command line. In this case, we are create a database called usaspending_20240908 using a specific tablespace called usaspending:

sudo -i -u postgres psql -c "create database usaspending_20240908 tablespace usaspending;"

Step 3: Identify Items to Restore

Once the data is downloaded we need to create a list of database items to restore (tables, views, indexes, etc).

The pg_restore command is used to list all the items in the backup. Example:

pg_restore --list /data/downloads/spending/2024-09-08

The output can be redirected to a file which will be used to restore the database. Example:

pg_restore --list /data/downloads/spending/2024-09-08 > /data/downloads/spending/restore_2024-09-08.list

Note that you can prune the items you wish to restore. For example, the USASpending database has lots of materialized views that can take some time to restore. You can create a list that excludes these items using the sed command like this:

pg_restore --list /data/downloads/spending/2024-09-08 | sed '/MATERIALIZED VIEW DATA/d' > /data/downloads/spending/restore_2024-09-08.list

Alternatively, you can restore just a single table like this:

pg_restore --list /data/downloads/spending/2024-09-08 | grep source_procurement_transaction | grep TABLE > /data/downloads/spending/restore_2024-09-08.list

Note when cherry picking tables, you must ensure that you create the corresponding schema or the restore process will fail. Example:

sudo -i -u postgres psql -d usaspending_20240908 -c "create schema raw;"

Step 4: Restore the Backup

Once we have identified what we want to restore, we can run the pg_restore command to restore the backup.

In the following example, we will use the list created above and load the backup into the usaspending_20240908 database:

sudo -i -u postgres pg_restore \
--jobs 16 \
--dbname postgresql://localhost:5432/usaspending_20240908 \
--verbose \
--exit-on-error \
--use-list /data/downloads/spending/restore_2024-09-08.list \
/data/downloads/spending/2024-09-08