Lab 2: Kafka for Data Streaming

In this lab, you will gain hands-on experience with Apache Kafka, a distributed streaming platform that plays a key role in processing large-scale real-time data. You will establish a connection to a Kafka broker, produce and consume messages, and explore Kafka command-line tools. This lab will prepare you for your group project, where you'll work with Kafka streams.

To receive credit for this lab, show your work to the TA during recitation.

Deliverables

Establish a secure SSH tunnel to the Kafka server. Explain to the TA about Kafka Topic and Offsets. How do they ensure message continuity if a consumer is disconnected?
Modify starter code to implement producer and consumer modes for a Kafka topic.
Use Kafka's CLI tools to manage and monitor Kafka topics and messages.

Getting started

Clone the starter code from this Git repository.
The repository includes a python notebook for Kafka producer and consumer model.
Install the Kafka Python package by running:
python -m pip install kafka-python

Connecting to Kafka server

Use SSH to create a tunnel to the Kafka server:
ssh -L <local_port>:localhost:<remote_port> <user>@<remote_server> -NTf
Test the Kafka server connection to ensure it's operational.

Implementing Producer-Consumer Mode

1. Producer Mode: Writes Data to Broker

Refer TODO sections in the script. Edit the bootstrap servers and add 2-3 cities of your choice. Run the code to write to Kafka stream.

2. Consumer Mode: Reads Data from Broker

Modify the TODO section by filling appropriate parameters/arguments in the starter code. Verify Kafka_log.csv.

Ref: KafkaProducer Documentation
KafkaConsumer Documentation

Using Kafka’s CLI tools

Kcat is a CLI (Command Line Interface). Previously known as kafkacat.
Install with your package installer such as:

macOS: brew install kcat
Ubuntu: apt-get install kcat
Note for Windows Users: Setting up kcat on Windows is complex. Please work in pairs with someone with mac/Ubuntu during recitation for this deliverable. The purpose is to understand CLI which will be helpful in the group project for using Kafka on Virtual machines (Linux based).

Using the kcat documentation, write a command that connects to the local Kafka broker, specifies a topic, and consumes messages from the earliest offset.

Ref: kcat usage
kcat GitHub

Optional but Recommended

For your group project you will be reading movies from the Kafka stream. Try finding the list of all topics and then read some movielog streams to get an idea of what the data looks like:
kcat -b localhost:9092 -L

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
KafkaDemo.ipynb		KafkaDemo.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lab 2: Kafka for Data Streaming

Deliverables

Getting started

Connecting to Kafka server

Implementing Producer-Consumer Mode

1. Producer Mode: Writes Data to Broker

2. Consumer Mode: Reads Data from Broker

Using Kafka’s CLI tools

Optional but Recommended

Additional resources

About

Releases

Packages

Languages

tanya-5/mlip-kafka-lab

Folders and files

Latest commit

History

Repository files navigation

Lab 2: Kafka for Data Streaming

Deliverables

Getting started

Connecting to Kafka server

Implementing Producer-Consumer Mode

1. Producer Mode: Writes Data to Broker

2. Consumer Mode: Reads Data from Broker

Using Kafka’s CLI tools

Optional but Recommended

Additional resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages