Skip to content

Latest commit

 

History

History
167 lines (145 loc) · 4.93 KB

README.md

File metadata and controls

167 lines (145 loc) · 4.93 KB

openblotter

Scrapes, stores and and displays Pittsburgh Bureau of Police incident data.

Disclaimer


Scraping Pittsburgh Bureau of Police logs every morning and display criminal incidents. Striving to provide the best analysis of the city's data, but limited to the accuracy of the contents and the number of incidents provided by the police. It is important that any decisions based on this data be confirmed using additional resources. As the city says, "The City of Pittsburgh has provided this information as a service. The City assumes no responsibility for the use of information posted on this site." Blame Tim Condello, Mark Howe, Andrew McGill, Andy Somerville and Open Pittsburgh for creating this application.

Background

Every morning (usually), the Pittsburgh police department publishes a PDF of the previous day's incidents and arrests here: http://communitysafety.pittsburghpa.gov/Blotter.aspx. Trouble is, they're posted in tough-to-read PDFs.

Openblotter scrapes these PDFs, inserts relevant information into a PostgreSQL database and serves it all up on a spiffy map.

Setup

Requirements

  • Apache or other web server
  • Python enabled on that web server
  • A PostgreSQL database

Python dependencies

Installation

These instructions assume you're using an httpd/Apache web server program.

  1. Download the repository and store in /var/html/www.
  2. Run sql/initialize.sql in PostgreSQL to set up incident and incidentdescription tables using the schema shown below.
  3. Add your PostgreSQL login credentials to py/contants.py.
  4. Install required libraries
  • sudo pip install psycopg2 (Don't forget psycopg's dependencies, python-dev and libpq-dev. Check notes here.)
  • sudo pip install pdfminer
  1. Set up a cronjob to run py/parser.py at regular intervals. (Example: 00 09,11,13,18 * * * /usr/bin/python /var/www/html/blotter/py/parser.py)
  2. Profit!

Errors and logs

Openblotter maintains an error log of misread (and therefore unincluded) entries at txt/errors.txt.

Each pre-scraped PDF is stored as pdf/YYYYMMDD.pdf.

Each just-converted text file is stored as txt/YYYYMMDD.txt.

Database schema

Openblotter's schema includes two tables: incident, which stores metadata (time, location, neighborhood) about a given event, and incidentdescription, which lists the various crimes associated with each event.

incident

Field Type Purpose
incidentid serial integer Unique ID associated with each incident
incidenttype character Type of incident: `Arrest` or `Offense 2.0`
incidentnumber integer ID assigned to incident by police
incidentdate date Date of incident
incidenttime time without timezone Time of incident
address character Address of incident (as reported by police, not geocoded)
neighborhood character Neighborhood of incident (as reported by police, not geocoded)
lat numeric Latitude of incident (geocoded from address and neighborhood)
lng numeric Longitude of incident (geocoded from address and neighborhood)
Zone character Police zone responding to incident (not always the same as the zone where the incident took place
age smallint Age of suspect (if `incidenttype` is `Arrest`)
Gender character Gender of suspect (if `incidenttype` is `Arrest`)
geom geometry(Point, 4326) Geometry of incident, derived from `lat`/`lng`

incidentdescription

</tbody>
Field Type Purpose
incidentdescriptionid serial integer Unique ID associated with each incident charge
incidentid integer Unique ID associated with each incident; links to `incident` table
section character Section of the this charge's criminal statute
description character Text description of charge