this project use GithubActions to Extract Transform and Load data from web scraping to S3 ready to be used for Analytic projects
Please create your virtual environment before, for example
python3 -m venv myenv
source myenv/bin/activate
Then run
pip install -r requirements.txt
python src/initial_load.py
in all Data engineering project, we need to realize first an initial load of our data. Initial load has the needed code.
main.py is the principal code that github actions run daily at 7:00, the source code is in .github/workflows/send-data.yaml
each day, we upload to S3 an CSV file with the processed data.