lakeFS - Data version control for your data lake | Git for data
-
Updated
Dec 24, 2024 - Go
lakeFS - Data version control for your data lake | Git for data
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Personal Data Engineering Projects
Data API Framework for AI Agents and Data Apps
Generic Data Ingestion & Dispersal Library for Hadoop
Enterprise-grade, production-hardened, serverless data lake on AWS
Real Time Big Data / IoT Machine Learning (Model Training and Inference) with HiveMQ (MQTT), TensorFlow IO and Apache Kafka - no additional data store like S3, HDFS or Spark required
Lakekeeper: A Rust native Iceberg REST Catalog
Use SQL to build ELT pipelines on a data lakehouse.
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
🤖 The semantic engine for LLMs, bringing semantic context to AI agents. 🔥
Add a description, image, and links to the data-lake topic page so that developers can more easily learn about it.
To associate your repository with the data-lake topic, visit your repo's landing page and select "manage topics."