Infodium (info + podium) is a lambda architecture demo for data processing and analysis.
The objective of this project is to demonstrate an analytical platform model that is capable of processing data in real time and batch and visualizing the result using different types of technologies..
-
It is a process developed in Python that reads a soccer game event dataset and sends the events to Kafka every 0.5 seconds. In addition, it reads some datasets (data of the matches, types of events, etc.), and saves it in MySql.
-
MySql stores the data received from the Data Generator to consume it later in the Spark and Flink process.
-
It receives the events in a topic to consume from the Flink process and stores them in S3 using Kafka connect.
-
S3 stores the data received from Kafka connect in a bucket to consume it later in batch processes (Spark).
-
The Flink process consumes Kafka topic and MySql tables (queries every 30 min) to analyze events in real time and inserts into Elasticsearch indexes.
-
Spark reads data stored in S3 and MySql to analyze in batch mode and stores in Elasticsearch indexes.
-
Elasticsearch stores the data analyzed in real time and batch mode.
-
Visualizes Elasticsearch data in a Dashboard.
Clone repository
git clone https://github.com/mtanvirulislam/infodium.git
Move to project directory
cd infodium
Set execute permission to start and stop script
chmod +x start
chmod +x stop
Start infodium
./start
#or
bash start
Stop infodium
./stop
#or
bash stop