This project is part of the Database-Management-Systems course. We propose a NoSQL database solution to manage 311 Incidents data openly published by the city of Chicago. The database can be accessed through our REST API service. Ouρ database design is simple yet efficient for all the queries we were asked to handle in this project. We experimented with various indexes which led us to performance gain in many cases.
- Download dataset from Kaggle
- Run import_request.py to import the requests into the database
- Run import_citizen.py to import the citizens into the database
- Create python virtual environment
virtualenv -p python3 myenv
- Activate it
source myenv/bin/activate
- Install project dependencies listed in requirements.txt file
pip3 install -r ./requirements.txt
- Run the flask application from the root directory of the project folder
export FLASK_APP=flaskapp flask run
Development server should be up and running on http://127.0.0.1:5000/
Our NoSQL database consists of two collections, request & citizen, which hold request and citizen documents respectively. Based on the principals of NoSQL databases, we settled for a non-normalized design. Thus, we avoided inefficient lookup queries between multiple collections.
More specifically, all types of requests are stored in the request collection. The schemaless design allows us to store requests with any number and any type of fields. The citizen collection contains documents of the following form:
- id
- address
- username
- phone
- upvotes[]
The relationship between citizens and requests (i.e. incidents) is modelled by storing an array of requests in each citizen document. This array contains the ObjectId's of the requests that the user has upvoted. Lastly, each request's location is stored as a GeoJSON Point in order to leverage MongoDB's Geospatial query operators.
The following time measurements were reported by Postman. We ensured that MongoDB query optimizer uses the defined indexes by using the explain() function.
Find the total requests per type that were created within a specified time range and sort them in descending order.
- No index: 2660ms
- Index(creation_date): 761ms
Find the number of total requests per day for a specified request type and time range.
- No index: 2430ms
- Index(creation_date): 657ms
- Index(request_type): 614ms
- CompoundIndex(creation_date,request_type): 110ms
- CompoundIndex(request_type,creation_date): 73ms
Find the three most common service requests per zipcode for a specified day.
- No index: 2020ms
- Index(zip_code): 2020ms
- Index(creation_date): 7ms
- CompoundIndex(creation_date,zip_code): 7ms
Find the three least common wards with regard to a given service request type.
- No index: 2350ms
- Index(request_type): 347ms
Find the average completion time per service request for a specified date range.
- No index: 3030ms
- Index(creation_date): 1120ms
- CompoundIndex(creation_date,request_type): 1110ms
Find the most common service request in a specified bounding box for a specified day. You are encouraged to use GeoJSON objects and Geospatial Query Operators.
- No index: 2060ms
- Index(2dsphere): 2080ms
- Index(creation_date): 5ms
Find the fifty most upvoted service requests for a specified day.
- No index: 33.31sec
- Index(creation_date): 31.34sec
- Index(upvotes) + Index(creation_date): 26ms
Find the fifty most active citizens, with regard to the total number of upvotes.
- No index: 101ms
- Index(upvotes): 96ms
Find the top fifty citizens, with regard to the total number of wards for which they have upvoted an incidents.
- No index: 30sec
- Index(upvotes): 30sec
Find all incident ids for which the same telephone number has been used for more than one names.
- No index: 680ms
- Index(upvotes): 680ms
- Index(phone): 680ms
- CompoundIndex(upvotes,phone): 600ms
- CompoundIndex(phone,upvotes): 612ms
Find all the wards in which a given name has cast a vote for an incident taking place in it.
- No index: 18ms
- Index(username): 15ms
Here you can find screenshots with query results as shown in postman.
Queries implemented in this project can be found in queries.sql