There are two Django applications in the BGA Public Salaries Database project:
The public user interface (payroll
), and the private interface for uploading
data (data_import
).
payroll
is a fairly straightforward application. It contains the core
models – Employer,
Position, Person, Job, and Salary. The models are so normalized because an
employer has many positions, more than one person can hold the same position,
and a person in the same position can have many different salaries over the
years.
The payroll
app defines the homepage and detail views for Employer proxy
models
Unit and Department, as well as the Person model. (Proxying Employer into Unit
and Department means that we have a cleaner way to differentiate Python logic
between the two types of Employer, while still leveraging the same underlying
database table.)
🚨 Note that this project uses the jinja2
templating
engine for application views. 🚨
In order to reduce load time on first visit, the payroll
app separates template
loading from most database queries. Instead, it performs the database queries
asynchronously via AJAX calls to an API implemented with the Django REST
Framework.
Finally, the payroll
application also exposes Django admin views to edit
employer name and classification.
Search is a bit more complicated. payroll
uses the Solr search
engine with custom Python adapters to
index
and search Employer
and Person payroll records.
We use Django's database cache backend to cache payroll
views. More
specifically:
- The index and entity pages are cached in their entirety.
- Database operations to gather display data for a given year are also fairly intensive, so API views are cached as well.
The data_import
application has more moving parts: If defines models to
contain and operate on uploaded data - namely Upload, RespondingAgency,
StandardizedFile, and SourceFile - as well as the views to perform those
operations.
A StandardizedFile is a data file following a standard data format for import into the database.
data_import
defines a user interface to upload and interactively import data
from standardized data files. The interactive import has a number of moving
parts:
- The import itself is a "finite state machine" governed by
django-fsm
. In other words, for each standardized file, there is a series of steps and instructions for moving from Step A to Step B and so on. The steps ("states") and their transitions are defined on the StandardizedFile model. - Each transition in the state machine refers to a series of
tasks. These
tasks can take a long time, so we use
celery
to queue and run tasks asynchronously, i.e., in the background. - Each delayed task leverages an instance of ImportUtility. This class mostly
defines methods to run SQL
queries
that transform the flat, standardized data into instances of the
payroll
models. - The import is interactive because there are several occasions during the
import process where we ask the user to review entities in the incoming data,
e.g., when it contains a responding agency or employer that we haven't seen
before. We use Redis, an in-memory data store, and a
Python library called
saferedisqueue
to queue records for review. Custom queue logic is defined inqueues.py
, the queues are populated by methods on the ImportUtility class, and the review routes are defined inviews.py
.
A SourceFile is a raw response file from an agency FOIA'ed by the BGA.
data_import
exposes an
interface for
uploading source files via the Django admin interface.
Standardized files and source files are tied together by (1) RespondingAgency
and (2) data year. The core payroll
models all have a source_file
method
that leverage this relationship to retrieve the source file for a given year.
Both the payroll
and data_import
applications have tests. These can be found
in the tests/
directory
at the root of the project. In general, these tests follow the guidance set out in
the DataMade testing guidelines.
data_import/test_tasks.py
, in particular, organizes tests into TestX
classes to
minimize redundant code.