Skip to content

ETL runs in Docker with AWS S3 as storage : Airflow - collecting , cleaning data and training model, Streamlit - few visualization and price prediction.

Notifications You must be signed in to change notification settings

MythiliPalanisamy/ImmoEliza_docker_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ETL Pipeline (Docker - Airflow - Streamlit)

airflow

📖 Table of Contents

  1. Introduction 📌
  2. Description 📜
  3. ETL Pipeline 📊
  4. Installation🔧
  5. Usage 🎮
  6. Completion 🏁

📌 Introduction

This project, part of the AI Bootcamp in Gent at BeCode.org, aims to create a pipeline from data cleaning to prediction with data visualization.

📜 Description

This project contains series of ETL (Extract, Transform, Load) processes runs within a Docker Compose environment.The workflow is divided into two main steps: Airflow and Streamlit.

  • Airflow
    - Scrape data from houses and appartments on sale
    - Clean the data with pandas
    - Train the Machine Learning model
    - Transfer the data to AWS S3 bucket
  • Streamlit
    - Few visualization of data
    - Price prediction

🔧 Installation

pandas numpy scikit xgboost seaborn matplotlib fastapi pydantic uvicorn pickleshare streamlit boto3

  • Clone this repository.
  • Install the required modules using pip install requirements.txt

📊 ETL Pipeline

Airflow airflow

  1. Airflow contains one DAG which triggers pipeline containes scraping, cleaning and training model from the scraped data

  2. Airflow uses AWS S3 bucket to store and retrive data

Streamlit

  1. Visual to explore the data
  2. Price prediction

🎮 Usage

This can be done by

  • Clone the repository
  • Open the terminal and redirect to the repository
  • Download and install Docker desktop in your machine
  • Run docker-compose up -dto run docker and respective containers
  • Open http://localhost:8080/ from your browser to view and access airflow UI and its logs. (you can also view logs in logs file that created in repo )
  • Open http://localhost:8501/to open streamlit for visualisation and prediction

🏁 Completion

Name - Mythili Palanisamy LinkedIN
Team type - solo

About

ETL runs in Docker with AWS S3 as storage : Airflow - collecting , cleaning data and training model, Streamlit - few visualization and price prediction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published