Skip to content

TheDataArtisanDev/kafka-streams-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Kafka Streams Processor

Apache Kafka Java Spring Boot Docker

Java-based Kafka Streams application for real-time data processing with deduplication, validation, and state store management.

πŸ“ Project Structure

Kafka-Streams-Processor/
β”œβ”€β”€ πŸ“‚ src/main/java/
β”‚   β”œβ”€β”€ πŸ“‚ consumers/                    # Kafka consumer implementations
β”‚   └── πŸ“‚ org/example/kafka/
β”‚       β”œβ”€β”€ πŸ“‚ config/                   # Kafka configuration
β”‚       β”œβ”€β”€ πŸ“‚ processor/                # Stream processors
β”‚       β”œβ”€β”€ πŸ“‚ streaming/                # Main streaming applications
β”‚       β”œβ”€β”€ πŸ“‚ topology/                 # Stream topology definitions
β”‚       └── πŸ“‚ validator/                # Message validation
β”œβ”€β”€ πŸ“‚ src/main/resources/               # Configuration files
β”œβ”€β”€ πŸ“„ docker-compose.yml               # Kafka cluster setup
β”œβ”€β”€ πŸ“„ pom.xml                          # Maven dependencies
└── πŸ“„ runbook.txt                      # Execution instructions

πŸš€ Quick Start

1. Start Kafka Cluster

docker-compose up -d

2. Build Application

mvn clean compile

3. Run Stream Processor

mvn exec:java -Dexec.mainClass="org.example.kafka.streaming.KafkaStreamProcessor"

4. Send Test Messages

# Connect to Kafka container
docker exec -it kafka-streams-processor-kafka-1 bash

# Start producer
kafka-console-producer --broker-list localhost:19092 --topic input-topic-account-create \
  --property "parse.key=true" --property "key.separator=:"

πŸ”§ Components

Stream Processors

Class Purpose
KafkaStreamProcessor Main streaming application entry point
KafkaJoinStreamingProcessor Stream joining operations
StreamProcessor Core stream processing logic
UniqueRecordProcessor Deduplication processing

Configuration

File Purpose
KafkaConfig.java Kafka streams configuration
application.yaml Application properties
config.yaml Custom configuration

Topics

  • input-topic-account-create - Account creation events
  • input-topic-account-update - Account update events
  • Output topics configured in topology

πŸ› οΈ Features

  • Deduplication: Unique record processing with state stores
  • Validation: Schema-based message validation
  • Join Operations: Stream-to-stream and stream-to-table joins
  • State Management: RocksDB-backed state stores
  • Error Handling: Dead letter queue patterns
  • Monitoring: Application metrics and logging

πŸ“‹ Prerequisites

  • Java 11+
  • Maven 3.6+
  • Docker and Docker Compose
  • Apache Kafka 2.7+

πŸ”§ Configuration

Environment Variables

KAFKA_BOOTSTRAP_SERVERS=localhost:19092
KAFKA_APPLICATION_ID=KafkaStreamProcessor

State Stores

  • state-store-account-create - Account creation deduplication
  • state-store-account-update - Account update deduplication

Schema Files

πŸ“Š Sample Data

Account creation event:

{
  "event_id": "unique-event-id-001",
  "timestamp": "2024-07-25T12:00:00Z",
  "account_created": {
    "account": {
      "id": "acc-001",
      "name": "Account Name",
      "status": "active",
      "permitted_denominations": ["GBP", "USD"]
    }
  }
}

πŸ” Monitoring

Application Logs

tail -f logs/application.log

Kafka Topics

# List topics
kafka-topics --bootstrap-server localhost:19092 --list

# Monitor consumer group
kafka-consumer-groups --bootstrap-server localhost:19092 --describe --group KafkaStreamProcessor

State Store Inspection

# Check RocksDB state stores
ls -la KafkaStreamProcessor/

πŸ—οΈ Architecture

Input Topics β†’ Stream Processor β†’ Validation β†’ Deduplication β†’ Output Topics
                      ↓
                 State Stores (RocksDB)

πŸ§ͺ Testing

Manual Testing

See runbook.txt for detailed test scenarios and sample messages.

Integration Tests

mvn test

πŸ›‘ Cleanup

# Stop application
Ctrl+C

# Stop Kafka cluster
docker-compose down

# Remove volumes
docker-compose down -v

Real-time data processing with Apache Kafka Streams

About

This repository contains the code to read the streaming kafka data and perform various streaming operations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages