This project involves creating an automated ETL (Extract, Transform, Load) pipeline to fetch weather data from an API, process and store it in CSV format using Python and Pandas, and finally load it into an Amazon S3 bucket. Apache Airflow is used to orchestrate and automate the workflow, while EC2 instances are used for hosting the Airflow scheduler and workers.
- Weather API: Source of weather data.
- Python: Used for data extraction and transformation.
- Apache Airflow: Orchestration tool for managing the ETL workflow.
- EC2 Instances: Hosts the Airflow scheduler and workers.
- Pandas: Python library used to process and store the data in CSV format.
- Amazon S3: Storage service where the processed data is stored.
Obtain Weather API Key: - Sign up at OpenWeatherMap to get your API key. - Follow the instructions to create an account and generate an API key.