-
Notifications
You must be signed in to change notification settings - Fork 14.8k
KAFKA-15700: FetchFromFollowerIntegrationTest is flaky #20982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
kirktrue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @jack2012aa.
I'm a little concerned that switching to test only the CLASSIC group protocol could hide the bug mentioned in the PR description, namely that use of the CONSUMER group protocol could result in "duplicate messages" in this case.
Thanks.
|
cc @lianetm |
Thanks for the review @kirktrue! Apologize for my phrasing. In the description I mean that using the I will rephrase the description to make it clearer. |
lianetm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this one! old flaky one. There is another jira btw https://issues.apache.org/jira/browse/KAFKA-15020 (probably same issue hopefully)
| @Disabled | ||
| @ParameterizedTest(name = TestInfoUtils.TestWithParameterizedGroupProtocolNames) | ||
| @MethodSource(Array("getTestGroupProtocolParametersAll")) | ||
| @ValueSource(strings = Array("classic")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me, this test is for the client-side RangeAssignor so it does not apply to the consumer protocol
| val records = future.get(30, TimeUnit.SECONDS) | ||
| assertEquals(assignments(i), records.map(r => new TopicPartition(r.topic, r.partition)).toSet) | ||
| } | ||
| consumers.foreach{ _.commitSync() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we disable auto-commits now that we're using manual?
Description
This PR fixes the flakiness in
testRackAwareRangeAssignor. Currently,it fails during the verification after
alterPartitionReassignments,and also during
groupProtocol = consumer.Using CLASSIC Group Protocol
The test relies on auto-commit, which is not deterministic during the
rebalance triggered by
alterPartitionReassignments. Consequently, theconsumer may fail to commit offsets before the rebalance completes. When
the partition is re-assigned, the consumer fetches previously consumed
messages (duplicate consumption). Since
verifyAssignmentspolls for afixed number of records, these duplicate messages cause the assertion to
fail (as it expects records from specific new partitions, not old ones).
A
Consumer#commitSyncis added after each verification to ensure thenext verification won't consume old data.
Using CONSUMER Group Protocol
PARTITION_ASSIGNMENT_STRATEGY_CONFIGis invalid when using theCONSUMER group protocol, so we can't use the
RangeAssignorhere.Now the test is restricted to run only with
groupProtocol = classic.Result
Successfully run 500 times in local:
