#

data-lakehouse

Here are 17 public repositories matching this topic...

eavilaes / qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

scala big-data spark sampling datasource hacktoberfest spark-sql data-lakehouse

Updated May 1, 2023
Scala

sudohainguyen / mini-lakehouse

Data lakehouse at home with docker compose

hive iceberg trino data-lakehouse

Updated May 20, 2023
Jupyter Notebook

prneidhardt / AWS-Data-Lakehouse

STEDI project

aws apache-spark data-manipulation data-lakehouse data-definition-language

Updated Jul 9, 2023
Python

dominikhei / Local-Data-LakeHouse

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

data-lake minio trino hive-metastore apache-iceberg lakehouse data-lakehouse

Updated Sep 2, 2023
Dockerfile

gupta-aayushkr / F1-Racing

The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.

sql azure databricks pyspark-notebook data-factory data-lakehouse

Updated Jan 10, 2024
Python

mahmoudparsian / data-warehousing

This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.

data-mining database etl data-visualization data-lake business-intelligence data-analytics elt data-modeling data-warehousing star-schema dimensional-modeling data-lakehouse

Updated Apr 2, 2024
HTML

Data-Kube / tst-datalakehouse-hudi

#Test - Create a Data Lakehouse in Kubernetes

kubernetes minio flink strimzi hudi data-lakehouse

Updated Oct 24, 2024

pracdata / awesome-open-source-data-engineering

A curated list of open source tools used in analytical stacks and data engineering ecosystem

Updated May 7, 2024

aabouzaid / modern-data-platform-poc

My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

kubernetes big-data data-engineering dataops data-platform cloud-native msc msc-project edinburgh-napier cloud-agnostic data-lakehouse

Updated May 12, 2024
Jupyter Notebook

cdelmonte-zg / delta-table-example

big-data data-lake change-data-capture delta-lake data-lakehouse delta-table change-data-feed

Updated May 22, 2024
Jupyter Notebook

ananyacanakapalli / University-Data-Design

This project is aimed at overhauling a university's data infrastructure to improve efficiency, security, and scalability, resulting in the successful creation of a unified data management solution.

aws athena kinesis data-visualization data-engineering data-analysis redshift data-security sagemaker aws-glue data-architecture data-lakehouse

Updated Jul 15, 2024

THeades / serverless-data-lakehouse

This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.

aws apache-spark terraform data-engineering apache-iceberg data-lakehouse

Updated Jul 26, 2024

k0rsakov / all_about_DuckDB

Всё что нужно знать про DuckDB

tutorial docker-compose tutorials data-engineering data-lake tutorial-code habr duckdb data-lakehouse data-engineering-project

Updated Aug 15, 2024
Jupyter Notebook

k0rsakov / infrastructure_for_data_engineer_S3

Инфраструктура для data engineer S3

tutorial docker-compose tutorials data-engineering data-lake tutorial-code habr data-lakehouse data-engineering-project

Updated Aug 15, 2024
Python

dataasee

ulbmuenster / dataasee

DatAasee - A Metadata-Lake for Libraries

metadata data-engineering data-lake data-catalog data-discovery metadata-management research-library university-library metadata-catalog academic-library metadata-mapping data-lakehouse metalake metadata-lake

Updated Sep 23, 2024
Makefile

qbeast-spark

Qbeast-io / qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

scala big-data spark sampling datasource spark-sql data-lakehouse

Updated Oct 24, 2024
Scala

huwngnosleep / complete_lakehouse_techstack

This project implements an end-to-end techstack for a data platform, for local development.

kafka spark hadoop etl bigdata data-warehouse data-platform lambda-architecture data-lakehouse

Updated Oct 25, 2024
Python

Improve this page

Add a description, image, and links to the data-lakehouse topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-lakehouse topic, visit your repo's landing page and select "manage topics."