Skip to content

Latest commit

 

History

History
9 lines (8 loc) · 716 Bytes

File metadata and controls

9 lines (8 loc) · 716 Bytes

wikipedia_stadium_data_pipeline_with_apache_airflow

This project implements an Apache Airflow DAG to scrape and process data on the largest football stadiums worldwide from Wikipedia. The pipeline extracts data from a Wikipedia page, cleans and stores it in a Postgres database, and performs SQL queries for further analysis. The key features include:

  • Scrapes football stadium data using BeautifulSoup and requests.
  • Cleans and transforms the data using pandas.
  • Stores the data in a Postgres database with automatic table creation.
  • Executes SQL queries for advanced analysis and saves the results to CSV.