This project implements an Apache Airflow DAG to scrape and process data on the largest football stadiums worldwide from Wikipedia. The pipeline extracts data from a Wikipedia page, cleans and stores it in a Postgres database, and performs SQL queries for further analysis. The key features include:
- Scrapes football stadium data using
BeautifulSoup
andrequests
. - Cleans and transforms the data using
pandas
. - Stores the data in a Postgres database with automatic table creation.
- Executes SQL queries for advanced analysis and saves the results to CSV.