This project, in partnership with RAFI-USA, shows concentration in the poultry packaging industry.
The dashboard is displayed on RAFI's site here. Visit the site for more detail on the project and the background — this README will focus on the technical details of the project.
The data pipeline for this project does the following:
- Joins records from FSIS inspections with historical business data provided by NETS.
- Calculates 60 mile road distances from each plant in the FSIS records meeting our filtering criteria.
- Creates GeoJSONs for areas with access to one, two, or three plus poultry integrators.
- Filters poultry barns identified by a computer vision model trained by Microsoft to reduce the number of false positives.
The pipeline runs in Docker. If you use VS Code, this is set up to run in a dev container, so build the container the way you normally would. Otherwise, just build the Docker image from the Dockerfile
in the root of the directory.
If you are using the dev container, make sure that you change the PLATFORM
variable in the devcontainer.json
for your chip architecture:
"args": {
"PLATFORM": "linux/arm64/v8" // Change this to "linux/amd64" on WSL and "linux/arm64/v8" on M1
}
Download the following files into the appropriate locations. Note that permission is required to access the DSI Google Drive.
- Example FSIS data is located in the DSI Google Drive: MPI Directory by Establishment Name | Establishment Demographic Data
- Save both files to
data/raw/
- You can also download new data from the FSIS Inspection site. Just update the filepaths config file
- Save both files to
- NETS data is located in the DSI Google Drive. Download this to
data/raw/
and save in a directory callednets
- Download the raw barns predictions for the entire USA from the DSI Google Drive and save to
data/raw/
- Barn filtering shapefiles: Download the zip of all of the shapefiles from Google Drive and extract to
data/shapefiles
. The sources for these shapefiles are listed inpipeline/rafi/config_geo_filters.yaml
.
If you are using different files (particularly for the FSIS data), just update the filenames in pipeline/rafi/config_filepaths.yaml
. Make sure the files are in the expected folder.
The pipeline uses Mapbox to calculate driving distances from the plants and expects a Mapbox API key located in a .env
file saved to the root of the directory:
MAPBOX_API=yOuRmApBoXaPiKey
After all of the files and API keys are in place, run the pipeline:
python pipeline/pipelinve_v2.py
Cleaned data files will be output in a run folder in data/clean/
. To update the files displayed on the dashboard, follow the instuctions in Updating the Dashboard Data
Note: You can also run each step of the pipline independently. Just make sure that the input files are available as expected in __main__
for each script.
There is old code in the pipeline/pipeline_v1/
directory. This includes a previous version of the pipeline that used Infogroup business data (rather than NETS data). This is saved for reference in case we want to use Infogroup again in a future version of the pipeline.
This is a Next.js project.
To run the dashboard locally, do not use the dev container!
Install packages:
npm install
The dashboard needs Mapbox credentials and service account credentials for Google Cloud.
It expects a .env.local
file in dashboard/
with a Mapbox API key and a base64-encoded Google service account JSON (with permissions to access Cloud Storage buckets):
NEXT_PUBLIC_MAPBOX_ACCESS_TOKEN=yOuRmApBoXaPiKey
GOOGLE_APPLICATION_CREDENTIALS_BASE64=<base64-encoded-service-account.json>
Run the development server:
npm run dev
Open http://localhost:3000 with your browser to see the result.
The dashboard is deployed via Vercel and is hosted on RAFI's site in an iframe.
Any update to the main
branch of this repo will update the production deployment of the dashboard.
If you rerun the pipeline, you need to update data files in both Google Cloud Storage and the files packaged with the Vercel deployment from GitHub.
The dashboard pulls data from Google Cloud Storage via an API. Upload the following files to the root of the rafi-poultry
storage bucket in the rafi-usa
project in the DSI account:
barns.geojson.gz
plants.geojson
The dashboard loads the isochrones files showing captured areas from dashboard/public/data/v2/isochrones.geojson.gz
The dashboard loads data in lib/data.js
. This loads the packaged data and the Google Cloud Storage data via API calls.
Data is managed in lib/state.js
and lib/useMapData.js
Both the NETS data and farmer locations are sensitive, so those data files are processed behind api routes located in api/
.
The dashboard consists primarily of a map component and a summary stats component.
The map logic lives in components/DeckGLMap.js
and components/ControlPanel.js
and the summary stats logic lives in components/SummaryStats.js
and the sub-components.