Wine-Prediction classifies the wine label based upon following features:
- fixed acidity
- volatile acidity
- citric acid
- residual sugar
- chlorides
- free sulfur dioxide
- total sulfur dioxide
- density
- pH
- sulphates
- alcohol
- quality
- label
This application is built to demonstrate the machine learning pipeline using widely used technologies.
Dataset is extracted from the UCI.
- Flask
- Python
- Streamlit
- Postgresql
- AirFlow 2.2
- Grafana
- Create a virtual environment with python3
python3 -m venv wine_prediction
- Activate the virtual environment:
cd wine_prediction source /bin/activate
- Install dependencies
pip install -r requirements.txt
- Create database and add .env file in
api/.env
. template of.env
is as follows:DATABASE_NAME = YOUR_DATABASE DATABASE_PORT = 5432 USER_NAME = YOUR_DATABASE_USER USER_PASSWORD = YOUR_DATABASE_USER_PASSWORD
- Navigate to root of the project
- Set environment variables
export FLASK_APP=app:create_app export APP_SETTINGS="api.config.DevelopmentConfig"
- Run Flask
flask run
- Navigate to the
/frontend
directory of application - Run streamlit application as:
streamlit run run.py
-
Create database user and grant all permission to that user which will be used to store the logs of airflow
Create user using psql shell.
CREATE DATABASE wine_airflow; CREATE USER airflow_user WITH ENCRYPTED PASSWORD 'airflow_pass'; GRANT ALL PRIVILEGES ON DATABASE wine_airflow TO airflow_user;
-
Go to root directory of project and set env variable
AIRFLOW_HOME
as:export AIRFLOW_HOME=$PWD/airflow
-
Initialize database
airflow db init
-
Create User (username:admin, password:admin) to access the airflow web application which will be run on
http://localhost:8080
airflow users create --username admin --firstname admin --lastname admin --role Admin --email admin@gmail.com --password admin
-
Start Airflow Scheduler
# Set Environment variable to use postgresql as database to store airflow log export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow_user:airflow_pass@localhost/wine_airflow airflow scheduler
-
Start Web Server
# Set Environment variable to use postgresql as database to store airflow log export AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow_user:airflow_pass@localhost/wine_airflow airflow webserver
Once you run the webserver you can access airflow dashboard on http://localhost:8080
.
Airflow has the following data ingestion pipeline:
When the data validation fails, airflow sends email to the respective member which can be configured by adding following
variables in airflow. To check this scenario we can enable mimic_validation_fail
in airflow variable.
Data Drift report can be generated by running the jupyter notebook available in the
directory /notebooks/data_drift_report.ipynb
. If there is drift in data reporting will be of the following format.