Skip to content

An end-to-end ELT data pipeline of the Brazilian olist e-commerce dataset using the modern data stack

Notifications You must be signed in to change notification settings

lawal-hash/OlistELT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OlistELT Pipeline

Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

pipeline

Demo

pipeline

Screenshots

pipeline

pipeline

Environment Variables

To run this project, you will need two API keys and two .env files (postgresql and airflow)

KAGGLE API KEY

GCP SERVICE ACCOUNT KEY

Postgres env Airflow env
POSTGRES_USER _AIRFLOW_WWW_USER_USERNAME
POSTGRES_PASSWORD _AIRFLOW_WWW_USER_PASSWORD
POSTGRES_DB
postgre_docker_init/
├── ingestion/
│   └── kaggle.json
|   └── .env
├── airflow/
|   └── .env
│   └── config/
|        └── service-account.json
├── dbt/
|    └── service-account.json
|
└── dashboard/
    └── .streamlit/
        └── secrets.toml

Run Locally

Clone the project

  git clone https://github.com/lawal-hash/OlistELT.git

Go to the project directory

  cd OlistELT

Install dependencies

  pip install -r requirements.txt

Start the project in the following order

Folder Command one Command two
ingestion cd ingestion docker compose up
airflow cd airflow docker compose up

Add the following connection ids from with Airflow UI. Go to Admin -> Connection then add the following

Postgres

  • Connection Id = postgres_conn_id
  • Connection Type = Postgres
  • Host = xxx
  • Schema = xxx
  • Login = xxx
  • Password = xxx
  • Port = xxx

Google Cloud Platform

  • Connection Id = gcp_conn_id
  • Connection Type = Google Cloud
  • Project ID = xxx
  • Keyfile Path = /opt/airflow/config/service-account.json

values xxx are the same as the one in Postgres .env file.

Change the Dag state from paused to active. Ensure the Dag status is sucessful before proceeding to the next step.

Folder Command one Command two
dbt cd dbt/olist_dbt chmod +x run.sh && ./run.sh

Start the dashboard server using the command below

Folder Command one Command two
dashboard cd dashboard streamlit run app.py

Authors

About

An end-to-end ELT data pipeline of the Brazilian olist e-commerce dataset using the modern data stack

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published