Skip to content

Latest commit

 

History

History
17 lines (14 loc) · 821 Bytes

README.md

File metadata and controls

17 lines (14 loc) · 821 Bytes

supreme-pancake

Repo for Big Data Management project

Three components were created in this project, a producer / data collector (kafka), a distributed database (CassandraDB) and a consumer / data processor (Spark).
The collection of data from a network of sensors was simulated, which then had to be processed and stored in a distributed and efficient way. The data collected (or generated) by kafka were then processed by spark and saved for long-term archiving on cassanda db.
The connection between the PCs has been made simple and scalable using Zerotier.

  • Leave a star ⭐ if you like this project 🙂 thank you.

What's inside

  • Kafka module
  • Cassanda db module
  • Spark module
  • Data cleaning scripts
  • Distributed job start and stop scripts
  • Project runme script
  • Project document with details