Skip to content

Wilmar3752/ETL_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL for car web scraping project

this project use GithubActions to Extract Transform and Load data from web scraping to S3 ready to be used for Analytic projects

0. Setup

Please create your virtual environment before, for example

python3 -m venv myenv
source myenv/bin/activate

Then run

pip install -r requirements.txt
python src/initial_load.py

1. Initial load

in all Data engineering project, we need to realize first an initial load of our data. Initial load has the needed code.

2. daily cron

main.py is the principal code that github actions run daily at 7:00, the source code is in .github/workflows/send-data.yaml

3. Final data

each day, we upload to S3 an CSV file with the processed data.

About

ETL for predict car prices in Colombia

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages