Skip to content

Rust libraries and binaries for working with USDA Branded Food Products CSV

Notifications You must be signed in to change notification settings

littlebunch/bfpd-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bfpd-rs

Rust PostgreSQL and MariaDB libraries for processing USDA Branded Food Products data. The Cargo workspace also includes a CLI utility for loading CSV and REST API and GraphQL servers.

What's here

The repository is organized as a Cargo workspace.

./mariadb -- MariadDB library
./pg -- PostgreSQL library
./graphql -- graphql server
./restapi -- REST server
./ingest-csv -- cli utility for importing the USDA csv files into the database
./data/pg -- Diesel migration scripts to create the PostgreSQL database and schema.rs
./data/mariadb -- Diesel migration scripts to create the MariaDB database and schema.rs

How to Set-up the Database

You can choose either PostgreSQL (13 recommended) or MariaDB 10. In either case, it's assumed you have a running instance of one or the other.

Step 1. Create an empty schema

For PostgreSQL:

createdb bfpd

For MariaDB:

mysql -u user -p -e"create schema bfpd;"

Step 2: Load the data

At this point, you have a couple of options: 1) download a dump of a recent version of the Branded Food Products database from https://go.littlebunch.com/posgresql for a PostgreSQL dump or https://go.littlebunch.com/mariadb for a MariaDB dump and restore to your local instance or 2) build the database from the ground-up by importing the USDA csv files using the provided ingest-csv command line utility.

Using a dump file

If you are using the first option, download the dump files and restore to your database:

For postgreSQL:

dropdb bfpd5 && createdb bfpd5 # only if database exists
xz -d -c < [downloaded.sql.xz] | psql -U [user] bfpd5

For MariaDB:

mysql -u [user] -p -e'drop database bfpd5' && mysql -u [user] -p -e'create database bfpd5' # only if database exists
xz -d -c [download.sql.xz] | mysql -u [user] -p bfpd5

Using the ingest-csv utility

  1. Build the binary Instructions for building the ingest-csv executable are provided in the ingest-csv/README.md.

  2. Download and unzip the latest csv from the FDC website into a directory of your choice. You will need the Branded Foods and Supporting data for All Downloads zip files:

    wget https://fdc.nal.usda.gov/fdc-datasets/FoodData_Central_branded_food_csv_2021-10-28.zip
    wget https://fdc.nal.usda.gov/fdc-datasets/FoodData_Central_Supporting_Data_csv_2021-10-28.zip
  3. Use the Diesel migration scripts in the data directory to create an empty database

    For PostgreSQL:

    psql -U user -W bfpd < data/pg/up.sql

    For MariaDB:

    mysql -u user -p bfpd < data/mariadb/up.sql

    Note: You can use the up.sql and down.sql scripts to create a diesel migration. This is probably more trouble than it's worth unless you need to change the schema or just want to learn a bit more about diesel migrations.

  4. Load the data by pointing the program to the full path containing the csv

    ./target/release/ingest-cvs -p /path/to/csv/

The load takes about 30 minutes to 2+ hours depending on your hardware. Note: you need to set a DATABASE_URL variable as described in Step 4 in the ingest-csv README.

Step 3 Publish the data

You can use the provided graphql or REST binaries to publish the data. Instructions for building and running the graphql server are provided in the graphql/README.md. Instructions for the REST api are provided in the restapi/README.md.