Skip to content

Conversation

@jack2012aa
Copy link
Contributor

@jack2012aa jack2012aa commented Nov 25, 2025

Description

This PR fixes the flakiness in testRackAwareRangeAssignor. Currently,
it fails during the verification after alterPartitionReassignments,
and also during groupProtocol = consumer.

Using CLASSIC Group Protocol

The test relies on auto-commit, which is not deterministic during the
rebalance triggered by alterPartitionReassignments. Consequently, the
consumer may fail to commit offsets before the rebalance completes. When
the partition is re-assigned, the consumer fetches previously consumed
messages (duplicate consumption). Since verifyAssignments polls for a
fixed number of records, these duplicate messages cause the assertion to
fail (as it expects records from specific new partitions, not old ones).

A Consumer#commitSync is added after each verification to ensure the
next verification won't consume old data.

Using CONSUMER Group Protocol

PARTITION_ASSIGNMENT_STRATEGY_CONFIG is invalid when using the
CONSUMER group protocol, so we can't use the RangeAssignor here.

Now the test is restricted to run only with groupProtocol = classic.

Result

Successfully run 500 times in local:
Screenshot 2025-11-24 at 6 56 00 PM

@github-actions github-actions bot added triage PRs from the community core Kafka Broker tests Test fixes (including flaky tests) small Small PRs labels Nov 25, 2025
Copy link
Contributor

@kirktrue kirktrue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @jack2012aa.

I'm a little concerned that switching to test only the CLASSIC group protocol could hide the bug mentioned in the PR description, namely that use of the CONSUMER group protocol could result in "duplicate messages" in this case.

Thanks.

@kirktrue
Copy link
Contributor

cc @lianetm

@jack2012aa
Copy link
Contributor Author

Thanks for the PR @jack2012aa.

I'm a little concerned that switching to test only the CLASSIC group protocol could hide the bug mentioned in the PR description, namely that use of the CONSUMER group protocol could result in "duplicate messages" in this case.

Thanks.

Thanks for the review @kirktrue! Apologize for my phrasing. In the description I mean that using the CLASSIC group protocol could lead to duplicate messages. PARTITION_ASSIGNMENT_STRATEGY_CONFIG is invalid when using the CONSUMER group protocol, so I removed it from the test, see KAFKA-17338.

I will rephrase the description to make it clearer.

@github-actions github-actions bot removed the triage PRs from the community label Nov 25, 2025
Copy link
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this one! old flaky one. There is another jira btw https://issues.apache.org/jira/browse/KAFKA-15020 (probably same issue hopefully)

@Disabled
@ParameterizedTest(name = TestInfoUtils.TestWithParameterizedGroupProtocolNames)
@MethodSource(Array("getTestGroupProtocolParametersAll"))
@ValueSource(strings = Array("classic"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me, this test is for the client-side RangeAssignor so it does not apply to the consumer protocol

val records = future.get(30, TimeUnit.SECONDS)
assertEquals(assignments(i), records.map(r => new TopicPartition(r.topic, r.partition)).toSet)
}
consumers.foreach{ _.commitSync() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we disable auto-commits now that we're using manual?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-approved core Kafka Broker small Small PRs tests Test fixes (including flaky tests)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants