Skip to content

Running auxilary programs

Mike Hoffert edited this page Apr 8, 2015 · 4 revisions
Depreciation warning
The CSV downloader, parser, and coordinate fetcher are officially depreciated. They were used for the initial fetching of data, but with the data source being unavailable, they are no longer useful. An admin interface has been provided to allow for modification of the existing data and adding new data.

Due to the fact that our data came from an external site (previously located at orii.health.gov.sk.ca), we have several auxiliary programs to assist in retrieving and handling this data. However, at the time of writing, our data source has been taken down, so the future of these tools is uncertain.

##CSV auto-downloader

This downloads all the CSV files from the source website. Each location has a single CSV file that details all the inspection data for that location. Due to the fact that it relies on browser automation (it will open a version of Firefox that is run completely automatically), it is quite slow. Ideally this won't have to be run frequently.

It will download all the files to a location specified in the source file. There's almost 5000 files downloaded taking up almost 100 MB. However, the actual amount downloaded is much larger since the program must visit web pages to download the CSVs from.

All CSV files are in the repo and are up-to-date as of the time that the site was taken down.

Run it with ./activator "run-main org.junit.runner.JUnitCore downloader.Download".

##CSV parser

This program reads in all the CSV files that the auto-downloader downloaded and spits out an SQL file that can be run on the database. The file is merely a few COPYs that will populate the entire tables. The output is placed in database/sql/statements.sql. The output file is used by database/createDatabase.sh.

See the source code for configuring the directory that the CSV files are located in and the output file location (they shouldn't have to be changed, though).

Note that the parser also has a translation system that can change the name, address, or city of any location. This is used to fix bad fields in the CSV files. The translations are described in database/translation.json. The format of that file is merely an array of objects, where each object specifies the name, address, and city of the location to change, and then a replacement object with the same fields that will be used instead. Note that \N is used to represent null columns.

The statements.sql file is kept up-to-date in the repo, so you do not have to run this program unless you make changes to the CSV files or the translation file.

Run it with ./activator "run-main csvParser.Main".

##Coordinate fetcher

The CSV files have addresses, postal codes, and cities, but do not have coordinates, which we need to efficiently plot all our locations on a map. Coordinates are stored in a separate table (named coordinate) that maps the address and city to a pair of coordinates. Note that not all locations have coordinates (and some don't even have addresses or cities).

The coordinate fetcher will populate the coordinates table with the coordinates for locations not already in the table. This prevents it from wasting time on getting coordinates for things we already know coordinates for.

The program uses Bing Maps for geocoding and must be given a Bing Maps API key.

On reasonably fast internet, the program takes about 10 minutes to run.

Run it with ./activator "test-only RunCoordinate"

Clone this wiki locally