Skip to content

SAB-6/ETL_ML_Pipeline_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Disaster Response ETL/ML Pipeline Project

ETL and Machine learning pipeline using python The ETL/ML project is part of Data Science Nanodegree Program by Udacity in collaboration with Figure Eight.

The Project is subdivided into:

  1. Preprocessing of data and ETL Pipeline (including data storage into a sql database)
  2. Building trained model through machine Learning Pipeline and GridSearchCV to tune the model hyperparameters
  3. Model deployment(Web app) with flask app.

Dependencies

  • Python 3+
  • NumPy==1.12.1
  • Pandas==2.0.15
  • Sciki-Learn 0.23+
  • NLTK (Natural Language Process Libraries)3.2.5+
  • SQLAlchemy 2.3+
  • Flask 0.12+
  • Plotly==2.0.15
  • gunicorn 19.9+

Instruction on how to run the program while in the main project directory (ETL_ML_Pipleine_project)

- To run the ETL pipeline (which extracts the data, transforms it and loads it into the database) type:

    python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db

- To run the Machine learning (ML) pipeline(which perform feature extraction, trains, predict and saves the model, type:

    python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl

- To run the app:

    Navigate to the app directory and type python run.py
    Then go to http://0.0.0.0:3001/

About

ETL and Machine learning pipeline using python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published