In this lab, you will gain hands-on experience with Apache Kafka, a distributed streaming platform that plays a key role in processing large-scale real-time data. You will establish a connection to a Kafka broker, produce and consume messages, and explore Kafka command-line tools. This lab will prepare you for your group project, where you'll work with Kafka streams.
To receive credit for this lab, show your work to the TA during recitation.
- Establish a secure SSH tunnel to the Kafka server. Explain to the TA about Kafka Topic and Offsets. How do they ensure message continuity if a consumer is disconnected?
- Modify starter code to implement producer and consumer modes for a Kafka topic.
- Use Kafka's CLI tools to manage and monitor Kafka topics and messages.
- Clone the starter code from this Git repository.
- The repository includes a python notebook for Kafka producer and consumer model.
- Install the Kafka Python package by running:
python -m pip install kafka-python
- Use SSH to create a tunnel to the Kafka server:
ssh -L <local_port>:localhost:<remote_port> <user>@<remote_server> -NTf
- Test the Kafka server connection to ensure it's operational.
Refer TODO sections in the script. Edit the bootstrap servers and add 2-3 cities of your choice. Run the code to write to Kafka stream.
Modify the TODO section by filling appropriate parameters/arguments in the starter code. Verify Kafka_log.csv
.
Ref: KafkaProducer Documentation
KafkaConsumer Documentation
Kcat is a CLI (Command Line Interface). Previously known as kafkacat.
Install with your package installer such as:
- macOS:
brew install kcat
- Ubuntu:
apt-get install kcat
- Note for Windows Users: Setting up kcat on Windows is complex. Please work in pairs with someone with mac/Ubuntu during recitation for this deliverable. The purpose is to understand CLI which will be helpful in the group project for using Kafka on Virtual machines (Linux based).
Using the kcat documentation, write a command that connects to the local Kafka broker, specifies a topic, and consumes messages from the earliest offset.
Ref: kcat usage
kcat GitHub
For your group project you will be reading movies from the Kafka stream. Try finding the list of all topics and then read some movielog streams to get an idea of what the data looks like:
kcat -b localhost:9092 -L