big-data-processing

Here are 74 public repositories matching this topic...

drshahizan / BDM

Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.

big-data big-data-analytics big-data-processing big-data-architecture

Updated Apr 7, 2024
Jupyter Notebook

souvik-databricks / dlt-with-debug

Star

A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.

big-data spark etl python3 databricks dlt etl-pipeline big-data-processing delta-live-tables

Updated Dec 7, 2022
Python

This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsearch, and Postgres

python big-data apache-flink big-data-processing realtime-streaming

Updated Dec 4, 2023
Java

felipefrizzo / terraform-aws-kinesis-firehose

Star

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

big-data analytics terraform kinesis-firehose cloudwatch-logs parquet terraform-provider etl-job terraform-aws big-data-processing

Updated Aug 4, 2021
HCL

eskimo-sh / eskimo

Star

Eskimo is a state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters on Kubernetes. This is the git repository of Eskimo Community Edition.

Updated Sep 14, 2023
Java

StarPlatinumStudio / Flink-SQL-Practice

Star

Flink SQL 实战 -中文博客专栏

sql stream-processing apache-flink big-data-processing

Updated Jun 17, 2022
Java

giucris / yasp

Star

Yet Another SPark Framework

framework scala big-data spark etl sparksql elt etl-framework etl-pipeline big-data-processing

Updated Feb 5, 2023
Scala

pyajs / veronica

Star

big data processing and machine learning platform，just like useing sql

sql python3 pyspark machine-learning-platform big-data-processing xql

Updated Oct 15, 2024
Python

hope-data-science / R4BD

Star

R for Big Data (Chinese Version)

r big-data big-data-processing big-data-analytics-techniques

Updated Nov 7, 2024
R

anjijava16 / GCP_Data_Enginner_Utils

Star

GCP_Data_Enginner

python bigquery scala notebook gcp pubsub pyspark dataflow shell-script dataproc-cluster dataproc gcp-storage big-data-processing

Updated Sep 4, 2021
Shell

impresso / impresso-text-acquisition

Star

🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.

big-data-processing historical-newspapers impresso-project

Updated Oct 1, 2024
Jupyter Notebook

bdnf / BigData-Engineering-Projects

Star

Data modeling with Cassandra, building Data Warehouse using Redshift and creation of Data Lake using Spark and Airflow

airflow spark cassandra data-warehouse data-lake redshift big-data-analytics big-data-processing

Updated Feb 28, 2020
Jupyter Notebook

tabletop-labs / tabletop

Star

A curated selection of tools, libraries and services that help tame your dataflow to productively build ambitious, data driven & reactive applications on a streaming lakehouse

real-time microservices kafka big-data stream-processing big-data-analytics timetravel big-data-processing modern-data-stack elasticscaling semi-structured-cloud-warehouse

Updated May 30, 2023
Go

theGuyWithBlackTie / electricChargingStations

Star

big-data electric-vehicles spark-ml charging-stations big-data-processing

Updated Dec 13, 2021
Jupyter Notebook

VladOnMyOwn / ctr-poisson-bootstrap

Star

Here I demonstrate the performance difference between the Poisson and the classic bootstrap by estimating the confidence interval for the difference of CTRs of the two user groups

python bootstrap statistics big-data ab-testing statistical-tests ab-tests ab-test click-through-rate big-data-processing poisson-bootstrap

Updated Oct 22, 2022
Jupyter Notebook

vvittis / FlinkSampling

Star

Reservoir Sampling for Group-By Queries in Flink Platform. Answering effectively Single Aggregate.

java topic stratum apache-flink sampling reservoir-sampling streaming-data big-data-analytics group-by big-data-processing streaming-tuples

Updated Aug 12, 2023
Java

kochlisGit / Big-Data-Algorithms

Star

Implementation of algorithms for big data using python, numpy, pandas.

python bloom-filter lsh streams frequent-itemset-mining pcy frequent-itemsets stream-mining shingling big-data-processing lsh-algorithm min-hasing similar-items a-priori multistage-pcy multihash-pcy

Updated Apr 27, 2020
Python

software-competence-center-hagenberg / AVUBDI

Star

Github Repository for a versatile usable Big Data infrastructure (AVUBDI)

docker kafka spark docker-compose docker-swarm template-project big-data-platform big-data-processing

Updated Feb 23, 2021
Shell

chandnii7 / Big-Data-Processing-Pipeline

Star

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

kafka big-data mongodb twitter-api data-visualization zookeeper data-analytics kafka-consumer kafka-producer tableau nosql-database kafka-streaming big-data-processing data-processing-pipelines

Updated Aug 2, 2021
Python

mtumilowicz / big-data-scala-spark-batch-workshop

Star

Introduction to Spark Batch processing.

big-data workshop spark workshop-materials batch-processing spark-sql big-data-processing

Updated May 27, 2024
Scala

Improve this page

Add a description, image, and links to the big-data-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the big-data-processing topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

big-data-processing

Here are 74 public repositories matching this topic...

drshahizan / BDM

souvik-databricks / dlt-with-debug

airscholar / FlinkCommerce

felipefrizzo / terraform-aws-kinesis-firehose

eskimo-sh / eskimo

StarPlatinumStudio / Flink-SQL-Practice

giucris / yasp

pyajs / veronica

hope-data-science / R4BD

anjijava16 / GCP_Data_Enginner_Utils

impresso / impresso-text-acquisition

bdnf / BigData-Engineering-Projects

tabletop-labs / tabletop

theGuyWithBlackTie / electricChargingStations

VladOnMyOwn / ctr-poisson-bootstrap

vvittis / FlinkSampling

kochlisGit / Big-Data-Algorithms

software-competence-center-hagenberg / AVUBDI

chandnii7 / Big-Data-Processing-Pipeline

mtumilowicz / big-data-scala-spark-batch-workshop

Improve this page

Add this topic to your repo