The core of this project is based on analyzing data from the UK Food Standards Agency. This data includes food hygiene ratings of various establishments across the UK.
- Database Setup: Load the establishments data into a MongoDB database and perform basic setup tasks.
- Database Update: Insert new data, modify existing entries, and remove unwanted data as required by the project.
- Exploratory Data Analysis: Query and analyze the data to answer specific questions regarding hygiene ratings, establishment locations, and other key factors.
In this part, we work with MongoDB and the provided dataset establishments.json
to create a database called uk_food
and a collection called establishments
. We perform the following tasks:
- Importing Data: Import the provided JSON file into MongoDB using the
mongoimport
command. - Setting up the Database: Create a database named
uk_food
and a collection namedestablishments
. - Database Exploration: Use PyMongo to interact with the database, and Pretty Print (pprint) to format the output.
- List all databases in MongoDB to confirm
uk_food
exists. - List all collections in the
uk_food
database to ensure theestablishments
collection is present. - Use
find_one()
to display one document from the collection. - Assign the
establishments
collection to a variable for further use.
In this part, we update the database as requested by the magazine editors:
- Insert a New Restaurant: Add data for a new halal restaurant named "Penang Flavours" located in Greenwich.
- Business Type Update: Find the
BusinessTypeID
for "Restaurant/Cafe/Canteen" and update the new restaurant with this ID. - Remove Data: Remove all establishments located in Dover from the database.
- Data Type Correction: Correct data types for fields like
latitude
,longitude
, andRatingValue
to ensure they are stored as decimal numbers and integers.
- Use
update_one()
to add "Penang Flavours" to the database. - Use
find()
to query and find theBusinessTypeID
for the givenBusinessType
. - Use
count_documents()
to check the number of documents in Dover before and after removal. - Use
update_many()
to convert latitude and longitude to decimal numbers andRatingValue
to integers.
In this part, we analyze the data to answer several questions that will help the magazine editors make decisions:
- Hygiene Score of 20: Find all establishments with a hygiene score of 20 and display the results.
- Establishments in London with a RatingValue >= 4: Find establishments in London with a rating value of 4 or higher.
- Top 5 Establishments with RatingValue of 5: Sort the top 5 establishments with a RatingValue of 5 by the lowest hygiene score, nearest to the new "Penang Flavours" restaurant.
- Establishments with Hygiene Score of 0: Group establishments by Local Authority area that have a hygiene score of 0, and sort them by the count.
- Use
count_documents()
to get the number of documents that match a query. - Use
pprint()
to display the first document of the result. - Use Pandas to convert the results into a DataFrame and display the first 10 rows.
- Perform aggregation using MongoDB's
aggregate()
method to group and sort the data based on the hygiene score.
- MongoDB: A NoSQL database for storing and querying the establishment data.
- PyMongo: Python driver for interacting with MongoDB.
- Pandas: Used for data manipulation and analysis.
- Jupyter Notebook: Interactive environment for coding and analysis.
- Clone the Repository:
git clone https://github.com/your-username/nosql-challenge.git cd nosql-challenge