wikipedia_stadium_data_pipeline_with_apache_airflow

This project implements an Apache Airflow DAG to scrape and process data on the largest football stadiums worldwide from Wikipedia. The pipeline extracts data from a Wikipedia page, cleans and stores it in a Postgres database, and performs SQL queries for further analysis. The key features include:

Scrapes football stadium data using BeautifulSoup and requests.
Cleans and transforms the data using pandas.
Stores the data in a Postgres database with automatic table creation.
Executes SQL queries for advanced analysis and saves the results to CSV.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.astro		.astro
.github/workflows		.github/workflows
dags		dags
include		include
tests/dags		tests/dags
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
packages.txt		packages.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wikipedia_stadium_data_pipeline_with_apache_airflow

About

Releases

Packages

Languages

Undisputed-jay/wikipedia_stadium_data_pipeline_with_apache_airflow

Folders and files

Latest commit

History

Repository files navigation

wikipedia_stadium_data_pipeline_with_apache_airflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages