This project is reading from a Kafka stream of events that gets published to the events
topic. Those messages are expected to be in a valid json format and to contain the fields uid
(the customer id as String) and ts
(a UNIX timestamp) in the first level of their json tree.
The script in this project will read the json information from that events
topic and count the number unique customer ids (the uid
field) per minute as defined in the event (the ts
field). The resulting counts will then be printed to the console and also published to a new topic called user_count
.
- Docker
- Scala SDK
- open a shell on the project root and navigate to folder
src/main/docker
- run
docker-compose up -d
to initially download and start the kafka environment - when it is there log into the
broker
container withdocker exec -it broker bash
- inside the
broker
container create the topics required for this job by running the following commands
kafka-topics \
--bootstrap-server localhost:9092 \
--topic events \
--create
kafka-topics \
--bootstrap-server localhost:9092 \
--topic user_count \
--create
- open your IDE and import the project
- configure the scala sdk, if needed
- run the script in the
CountEventsBasic
object - the code will keep running and write the counted results to the
user_count
topic and to the console
- use the open shell logged in to the
broker
or log in to the container again with
docker exec -it broker bash
- start the consumer on the topic with
kafka-console-consumer \
--bootstrap-server localhost:9092 \
--topic user_count \
--from-beginning \
--formatter kafka.tools.DefaultMessageFormatter \
--property print.key=true \
--property print.value=true
This project provides a very basic counting algorithm that only works on a single node. When there is a bigger amount of data being pushed to the incoming topic then it would be probably be better to use a more sophisticated solution that can handle parallel counting und multiple consumers in a cluster, that are polling from a bigger Kafka cluster with multiple brokers.