Author: Rodrigo Anderson - Software Engineer
The Pump Sensor Data API is a FastAPI-based project designed to interact with pump sensor data, originating from this Kaggle dataset. This study case focuses on practicing API building and handling sensor data. The API provides two main endpoints, offering specific functionalities for filtering sensor data based on predefined criteria and for receiving and organizing the data in a specific format.
-
Database Format and File Management:
The original dataset for this project was obtained in CSV format from Kaggle's Pump Sensor Data project. However, to facilitate data management and improve query performance, it was decided to convert this data into an SQLite database. SQLite offers a lightweight, file-based database solution that doesn't require a separate server process, making it a convenient choice for this project.
However, it's worth noting that the size of the SQLite database file (
sensor_data.db
) exceeded GitHub's file size limits. Although it's generally not considered best practice to include a database file in a repository, it was necessary to make the project easy to set up and run. To address the size limitation, the database file was compressed into a ZIP file (sensor_data.db.zip
), which is included in the repository.To use the database, you have two options:
-
Unzip the Provided File: Simply unzip
sensor_data.db.zip
to create thesensor_data.db
file at the root of the project. This is the quickest way to get started. -
Recreate the Database (Optional): If you prefer to recreate the database yourself, follow these steps:
i. Download the
sensor.csv
file from Kaggle and place it in the root directory of the project.ii. Inside a virtual environment (conda, python venv, etc) with
pandas
installed, run:python create_db.py
-
-
Data Filtering Constraints:
During the initial stages of the project, the idea was to filter data from April 2018 for sensors 07 and 47, with values greater than 20 and less than 30. However, applying these constraints resulted in an empty table, as shown below:
This outcome was not aligned with the project's objectives, so a decision was made to modify the constraints to include more data. The updated constraints are:
- Include values greater than 20 and less than 30 for sensor_47.
- Include values greater than 10 and less than 30 for sensor_07.
The new constraints allow for meaningful data representation, as evidenced in the following query result:
These adjustments maintain the project's alignment with its original intent, while ensuring that the filtered data is representative and meaningful.
-
GET Endpoint: Filters the sensor data from April 2018 for sensors 07 and 47 with the values according to the constraints mentioned in the disclaimers.
-
POST Endpoint: Receives the filtered data and organizes it into a Pandas DataFrame - also print the dataframe at the terminal.
-
Clone the repository to your local machine.
-
Ensure you have Docker and Docker Compose installed.
-
Ensure that the
sensor_data.db.zip
file is present in the root directory of the project, and unzip it to extractsensor_data.db
. Alternatively, you can download thesensor.csv
file from Kaggle, place it in the root directory, and run the following command in a Python environment containing the Pandas library:python create_db.py
-
Run the following command to build and start the containers for the production server:
docker-compose up app
This will start the server locally, and you can access the API and its documentation at
http://0.0.0.0:5001/docs
.
-
If you want to run the tests, you can execute the following command:
docker-compose up test
This will run the tests inside the Docker container and display the results in the console.
- Clone the repository to your local machine.
- unzip the
sensor_data.db.zip
file.
- On Linux:
# Install unzip utility
sudo apt-get update && sudo apt-get install -y unzip
# Unzip the sensor_data.db.zip file (must be on project root folder)
unzip sensor_data.db.zip
- Install Poetry.
- on Linux:
# Install poetry
curl -sSL https://install.python-poetry.org | python -
-
Run the following commands:
poetry install make run
-
If you want to run the tests, after creating the poetry setup, you can execute the following command:
make test
This will run the tests locally and display the results in the console.
Endpoint: http://0.0.0.0:5001/data
CURL Example:
curl -X 'GET' \
'http://0.0.0.0:5001/data' \
-H 'accept: application/json'
Endpoint: http://0.0.0.0:5001/receiveData
CURL Example:
curl -X 'POST' \
'http://0.0.0.0:5001/receiveData' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"data": [
{
"timestamp": "2018-04-18 04:41:00",
"machine_status": "RECOVERING",
"sensors": [
{
"name": "sensor_07",
"value": 11.37153
},
{
"name": "sensor_47",
"value": 29.513890000000004
}
]
},
{
"timestamp": "2018-04-18 04:42:00",
"machine_status": "RECOVERING",
"sensors": [
{
"name": "sensor_07",
"value": 11.32089
},
{
"name": "sensor_47",
"value": 29.513890000000004
}
]
},
{
"timestamp": "2018-04-18 04:43:00",
"machine_status": "RECOVERING",
"sensors": [
{
"name": "sensor_07",
"value": 11.32089
},
{
"name": "sensor_47",
"value": 29.22454
}
]
},
{
"timestamp": "2018-04-18 04:44:00",
"machine_status": "RECOVERING",
"sensors": [
{
"name": "sensor_07",
"value": 11.32813
},
{
"name": "sensor_47",
"value": 29.224536895752
}
]
}
]
}'
The application uses FastAPI, and you can access the automatically generated Swagger UI at:
Here you can explore the API's endpoints, data structures, and more. The main page provides an overview of the available API methods.
You can also execute API requests directly from the Swagger UI. Simply click on the endpoint you want to try, fill in any required parameters, and hit the "Execute" button.