This project is a part of the Streaming Data Pipeline and Data Visualization project for ICCS361 at Mahidol University International College. This project demonstrates a real-time data streaming pipeline using trading data from Binance WebSocket API for the BTC/USDT and ETH/USDT trading pairs. The pipeline ingests, processes, and visualizes data using a stack of modern tools including Kafka, Logstash, Elasticsearch, and Kibana, all containerized with Docker.
The project’s architecture consists of the following steps:
-
Data Source: Trading data is fetched from the Binance WebSocket API, which includes details such as event time, symbol, trade ID, price, quantity, and more.
-
Buffering: Kafka serves as a buffering layer, ensuring scalability and fault tolerance for the data stream.
-
Data Aggregation & Processing: Logstash processes and transforms the Kafka data to make it suitable for indexing into Elasticsearch.
-
Indexing & Visualization: Processed data is stored in Elasticsearch and visualized on a Kibana dashboard to enable real-time insights.
Data from the Binance WebSocket API is structured as follows:
- e: Event type
- E: Event time (in milliseconds since Unix epoch)
- s: Symbol (e.g., BTCUSDT, ETHUSDT)
- t: Trade ID
- p: Price
- q: Quantity
- T: Trade time (in milliseconds since Unix epoch)
- m: Buyer market maker flag
- M: Ignore (always true and can be ignored)
- Docker and Docker Compose installed.
- Python 3.9+ installed.
- Access to a Binance account to retrieve WebSocket API data.
-
Clone the Repository:
git clone https://github.com/BothBosu/Crypto-Streaming-Pipeline-using-Kafka-Logstash-Elasticsearch.git cd .\Crypto-Streaming-Pipeline-using-Kafka-Logstash-Elasticsearch\
-
Run Docker Containers:
docker-compose up
-
Start the Producers:
python producer/kafka_crypto_producer.py
-
Access Kibana Dashboard: Open Kibana at http://localhost:5601