Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
Nov 27, 2025 - Python
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fancy stream processing made operationally mundane
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.
Flink CDC is a streaming data integration tool
Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Privacy and Security focused Segment-alternative, in Golang and React
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
Build data pipelines, the easy way 🛠️
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Spreadsheet with AI, Code, Connections
Maestro: Netflix’s Workflow Orchestrator
Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!
A curated list with resources about node-based UIs
A system for agentic LLM-powered data processing and ETL
Python scripts for ETL (extract, transform and load) jobs for Ethereum blocks, transactions, ERC20 / ERC721 tokens, transfers, receipts, logs, contracts, internal transactions. Data is available in Google BigQuery https://goo.gl/oY5BCQ
Add a description, image, and links to the etl topic page so that developers can more easily learn about it.
To associate your repository with the etl topic, visit your repo's landing page and select "manage topics."