[FLINK-38453] Add full splits to KafkaSourceEnumState #192

AHeise · 2025-09-30T13:50:49Z

KafkaEnumerator's state contains the TopicPartitions only but not the offsets, so it doesn't contain the full split state contrary to the design intent.

There are a couple of issues with that approach. It implicitly assumes that splits are fully assigned to readers before the first checkpoint. Else the enumerator will invoke the offset initializer again on recovery from such a checkpoint leading to inconsistencies (LATEST may be initialized during the first attempt for some partitions and initialized during second attempt for others).

Through addSplitBack callback, you may also get these scenarios later for BATCH which actually leads to duplicate rows (in case of EARLIEST or SPECIFIC-OFFSETS) or data loss (in case of LATEST). Finally, it's not possible to safely use KafkaSource as part of a HybridSource because the offset initializer cannot even be recreated on recovery.

All cases are solved by also retaining the offset in the enumerator state. To that end, this commit merges the async discovery phases to immediately initialize the splits from the partitions. Any subsequent checkpoint will contain the proper start offset.

KafkaEnumerator's state contains the TopicPartitions only but not the offsets, so it doesn't contain the full split state contrary to the design intent. There are a couple of issues with that approach. It implicitly assumes that splits are fully assigned to readers before the first checkpoint. Else the enumerator will invoke the offset initializer again on recovery from such a checkpoint leading to inconsistencies (LATEST may be initialized during the first attempt for some partitions and initialized during second attempt for others). Through addSplitBack callback, you may also get these scenarios later for BATCH which actually leads to duplicate rows (in case of EARLIEST or SPECIFIC-OFFSETS) or data loss (in case of LATEST). Finally, it's not possible to safely use KafkaSource as part of a HybridSource because the offset initializer cannot even be recreated on recovery. All cases are solved by also retaining the offset in the enumerator state. To that end, this commit merges the async discovery phases to immediately initialize the splits from the partitions. Any subsequent checkpoint will contain the proper start offset.

Savonitar · 2025-09-30T18:13:26Z

...a/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumStateSerializerTest.java

+                topicPartitions.add(
+                        new KafkaPartitionSplit(
+                                new TopicPartition(TOPIC_PREFIX + readerId, partition),
+                                STARTING_OFFSET));


Thanks for the PR. This is a very good improvement for the connector.
I noticed that the current test creates splits using the constant KafkaPartitionSplit.EARLIEST_OFFSET, would it make sense to add a test case that uses a real-world offset (e.g., 123)?

Savonitar · 2025-09-30T18:24:00Z

.../test/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumeratorTest.java

-    public void testAddSplitsBack() throws Throwable {
+    @ParameterizedTest
+    @EnumSource(StandardOffsetsInitializer.class)
+    public void testAddSplitsBack(StandardOffsetsInitializer offsetsInitializer) throws Throwable {


Is my understanding correct that the test verifies that the offset is correctly recalculated on recovery, but doesn't verify that the original offset(before the failure) was preserved and restored?

fapaul

Looks mostly good left some inline comments

fapaul · 2025-10-02T06:57:17Z

.../java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumStateSerializer.java

+                        new SplitAndAssignmentStatus(
+                                new KafkaPartitionSplit(
+                                        new TopicPartition(topic, partition),
+                                        DEFAULT_STARTING_OFFSET),


Isn't this a behavioral change? Previously the unassigned split would get the starting offset configured by the user on reassignment.

fapaul · 2025-10-02T07:28:25Z

.../src/main/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumerator.java

    private boolean noMoreNewPartitionSplits = false;
    // this flag will be marked as true if initial partitions are discovered after enumerator starts
-    private boolean initialDiscoveryFinished;
+    private volatile boolean initialDiscoveryFinished;


Can you explain why volatile is needed here? Afaik we don't access it from a different thread than before.

fapaul · 2025-10-02T07:29:18Z

.../src/main/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumerator.java

-     * @param fetchedPartitions Map from topic name to its description
-     * @param t Exception in worker thread
-     */
-    private void checkPartitionChanges(Set<TopicPartition> fetchedPartitions, Throwable t) {


Nit: It would have been good to have a small refactoring commit to simplify the method calls it makes the reviewer easier :)

fapaul · 2025-10-02T07:33:37Z

.../test/java/org/apache/flink/connector/kafka/source/enumerator/KafkaSourceEnumeratorTest.java

-                        assertThat(expectedAssignmentsForReader)
-                                .contains(split.getTopicPartition());
+        assertThat(actualAssignments).containsOnlyKeys(expectedAssignments.keySet());
+        SoftAssertions.assertSoftly(


I am not a big fan of soft assertions but this pattern with the lambda I think is "okay". It's just a dangerous precedent when folks start using the the version that requires explicitly calling assertAll() and we silently disable all exceptions.

boring-cyborg bot added the component=Connectors/Kafka label Sep 30, 2025

Savonitar reviewed Sep 30, 2025

View reviewed changes

fapaul self-requested a review October 2, 2025 06:41

fapaul reviewed Oct 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FLINK-38453] Add full splits to KafkaSourceEnumState #192

[FLINK-38453] Add full splits to KafkaSourceEnumState #192

AHeise commented Sep 30, 2025

Uh oh!

Savonitar Sep 30, 2025

Uh oh!

Savonitar Sep 30, 2025

Uh oh!

fapaul left a comment

Uh oh!

fapaul Oct 2, 2025

Uh oh!

fapaul Oct 2, 2025

Uh oh!

fapaul Oct 2, 2025

Uh oh!

fapaul Oct 2, 2025

Uh oh!

Uh oh!

[FLINK-38453] Add full splits to KafkaSourceEnumState #192

Are you sure you want to change the base?

[FLINK-38453] Add full splits to KafkaSourceEnumState #192

Conversation

AHeise commented Sep 30, 2025

Uh oh!

Savonitar Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

Savonitar Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

fapaul left a comment

Choose a reason for hiding this comment

Uh oh!

fapaul Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

fapaul Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

fapaul Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

fapaul Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!