-
Notifications
You must be signed in to change notification settings - Fork 158
make committed offset accurate when partition assigned to avoid offset reset #893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hey @sangreal - i will dig a bit more into it - but it doesn't look right to me - whenever partition is dirty (any offset was processed) - we do need to commit - even if highest succeeded was not advanced. |
|
If you could build a test / reproducible example of this - it would help as well... |
|
@rkolesnev It is really great that you could take your personal time to review my pr 👍 Again, I am all ear for your suggestion. |
|
Hmm, I still don't fully understand where is the issue though. |
|
I have updated the way to fix after more thinking. The idea is make sure getOffsetToCommit is invoked only once to avoid dirty read and commit the wrong offset. |
|
@johnbyrnejb please help review as well, thx a lot |
|
Parallel Consumer Offset reset Issue flow.pdf |
|
Let me explain more for you guys to reviews.
@rkolesnev @johnbyrnejb please help review when you have time, we are waiting for the fix since this offset reset issue happens once every several days. Thanks a lot. |
|
@sangreal - i had spent more time looking into this - and will try to get it done this week. The Parallel Consumer has a bug somewhere in marking state dirty and advancing offset to commit by 1 - so after multiple rebalances it ends up committing not offset 10 - but offset 11 - which brings subscription out of valid range and causes auto offset reset to happen... I am in the process of mapping all possible state transitions for PartitionState to work out if there are any other race conditions / state mismatches. |
|
@rkolesnev thank you so much for your efforts! |
Description...
make offsetHighestSucceeded accurate when partition assigned
We have encountered offset reset issue while frequent partition rebalancing.
The root cause is caused by :
(1) the
offsetHighestSucceededis assigned w/ offset inOffsetAndMetadatawhich is to-be processed(2)
incompletesis non-empty(3) one
WorkContaineris processed successfully, thendirtyistrue(this offset <offsetHighestSucceeded)(4) committer choose (
offsetHighestSucceeded + 1) to commit becauseincompletesis non-empty (the offset is removed fromincompletes)(5) rebalancing happens, new consumer try to pull record will throw
out of rangeand begin offset resetissue #894
Checklist