An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
-
Updated
Mar 9, 2020 - Python
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This project demonstrates how to build and automate an ETL pipeline written in Python and schedule it using open source Apache Airflow orchestration tool on AWS EC2 instance.
This repository is no longer maintained.
The script automates the collection and insertion of KPIs related to transaction time and storage usage in a Data Warehouse, using Apache Airflow. It calculates the time elapsed since the last transaction and the percentage of storage usage, recording this data periodically in specific tables.
An airflow DAG transformation framework
Analysing live tweets from twitter by generating a big data pipeline and scheduling it with Airflow (Using also Kafka for tweet ingestion, Cassandra for storing parsed tweets, and Spark for Analysis)
This is an ELT data pipeline setup to track the activities of an e-commerce website based on orders, reviews, deliveries and shipment date. This project utilized technologies like Airflow, AWS RDS-Postgres, Python etc.
Build a data warehouse from scratch, including full load, daily incremental load, design schema, SCD Type 1 and 2.
Фабрика DAG
This project demonstrates end-to-end ETL pipelines for HR and people analytics. It uses Oracle Database and Apache Airflow to extract, transform, and load data, and generate reports on workforce metrics.
Apache Airflow demo project that setup 3 DAGs to explain how to pass parameters from a DAG to a triggered DAG.
Orchestrate data pipeline using airflow
This project contains a comprehensive collection of Apache Airflow DAGs designed for learning Airflow concepts from basics to advanced levels. The project includes 25 different DAGs covering various operators, patterns, and production scenarios, all deployed and tested using Astronomer Cloud.
Clean up old Airflow log files with a script or Airflow DAG. Frees disk space by deleting rotated logs, removing old files, and cleaning up empty directories.
This project focuses on implementing an ETL pipeline using Apache Airflow to efficiently extract data from Reddit, transform it as needed, and load it into an AWS S3 bucket. The use of Airflow allows for robust orchestration of the data workflow, ensuring that each step of the ETL process is executed in a reliable and repeatable manner.
Data Engineering Projects on data modelling, data warehousing, data lake development, orchestration and analysis
1T Data "Data architect (DevOps)". Задание 2024-08-21 6.8
Creation of the almost-real time data processing pipeline for the Pintrest posts.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Add a description, image, and links to the airflow-dag topic page so that developers can more easily learn about it.
To associate your repository with the airflow-dag topic, visit your repo's landing page and select "manage topics."