Investrix is a complete real-time data engineering project designed to simulate the ingestion, processing, storage, and querying of live stock market data using Apache Kafka, Python, and AWS Cloud Services.
Investrix demonstrates how to build a real-time streaming data pipeline by integrating open-source and cloud-native technologies. The pipeline includes:
- Apache Kafka for real-time message streaming
- Python for producer/consumer logic
- Amazon S3 for scalable object storage
- AWS Glue for schema inference and cataloging
- AWS Athena for serverless querying
This project is ideal for learning modern data engineering, real-time analytics, and cloud integration.
- Kafka Producer streams CSV stock data to a Kafka topic.
- Kafka Consumer listens to that topic and writes the data to an Amazon S3 bucket.
- AWS Glue Crawler detects schema from S3 and updates the Glue Catalog.
- AWS Athena runs SQL queries over structured S3 data using the Glue Catalog.
| Layer | Tool/Service |
|---|---|
| Programming | Python |
| Stream Processing | Apache Kafka |
| Cloud Compute | AWS EC2 |
| Cloud Storage | AWS S3 |
| Metadata Catalog | AWS Glue Crawler, Catalog |
| Query Engine | AWS Athena |