Fix NOT_LEADER_FOR_PARTITION error with Confluent Cloud by kamir · Pull Request #8 · scalytics/kshark-core

kamir · 2026-02-11T12:18:55Z

This commit addresses the "Not Leader For Partition" error that occurs when using kshark with Confluent Cloud clusters.

Root Cause:

The Writer's Transport was not properly using the configured Dialer
Transport created new connections without SASL/TLS from the Dialer
Bootstrap servers were used directly instead of allowing metadata discovery of actual partition leader brokers

Changes:

Configure Transport to use Dialer's DialContext method
- Ensures all connections use proper SASL/TLS authentication
- Enables correct broker discovery for managed Kafka services
- Fixes metadata refresh for Confluent Cloud's dynamic brokers
Add retry logic for metadata-related errors
- Retry up to 3 times with exponential backoff (500ms, 1s, 1.5s)
- Specifically handle NOT_LEADER_FOR_PARTITION and LeaderNotAvailable
- Non-retryable errors fail immediately without wasted retries
- Detailed logging for debugging retry attempts

References:

https://claude.ai/code/session_01VSnKepTAD53ZUfXYDDvXwa

This commit addresses the "Not Leader For Partition" error that occurs when using kshark with Confluent Cloud clusters. Root Cause: - The Writer's Transport was not properly using the configured Dialer - Transport created new connections without SASL/TLS from the Dialer - Bootstrap servers were used directly instead of allowing metadata discovery of actual partition leader brokers Changes: 1. Configure Transport to use Dialer's DialContext method - Ensures all connections use proper SASL/TLS authentication - Enables correct broker discovery for managed Kafka services - Fixes metadata refresh for Confluent Cloud's dynamic brokers 2. Add retry logic for metadata-related errors - Retry up to 3 times with exponential backoff (500ms, 1s, 1.5s) - Specifically handle NOT_LEADER_FOR_PARTITION and LeaderNotAvailable - Non-retryable errors fail immediately without wasted retries - Detailed logging for debugging retry attempts References: - segmentio/kafka-go#1078 - segmentio/kafka-go#712 https://claude.ai/code/session_01VSnKepTAD53ZUfXYDDvXwa

kamir merged commit 478b61b into main Feb 11, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NOT_LEADER_FOR_PARTITION error with Confluent Cloud#8

Fix NOT_LEADER_FOR_PARTITION error with Confluent Cloud#8
kamir merged 1 commit intomainfrom
claude/fix-kshark-confluent-error-XOQbE

kamir commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kamir commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants