Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix seeking by timestamp can be reset the cursor position to earliest #23919

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

dao-jun
Copy link
Member

@dao-jun dao-jun commented Feb 2, 2025

Fixes #23910

Motivation

As the we don't enable AppendBrokerTimestampMetadataInterceptor by default, so the entries timestamp is not Strictly Increasing.

Because the message timestamp is generated by the clients, the messages from different producers maybe not in global ordering(because of network delay, backpressure, thread scheduling, etc)

In a single ledger, they may be arranged in the following way:
[2, 1, 3, 5, 4, 6, 7, 9, 8.....]
Overall, they have a self increasing trend, but locally, it may be possible not.

  1. If the Position is null when call PersistentMessageFinder.findMessages, it will reset the cursor position to earliest, see: https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java#L804-L824
  2. If the first entry we read from bk cannot meet the condition, it will return null. see: https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpFindNewest.java#L87-L91

In 823a55d#diff-1d1f02c0ae1aed67e77512aebe4d7233705490b14960ee428611462e446861e5R133-R142, we optimized the case of the target entry maybe in the last opening ledger. It's very intuitive but the actual situation is more complicated:

If we want to find the message's position whose timestamp is 101, the second-to-last ledger's close timestamp is 100, and the entries's timestamp in the last opening ledger arranged as [102,103,101,104...]
The first entry's timestamp is greater than 101, so it will not meet the condition of https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageFinder.java#L74-L75, and the return value will be null, the cursor position will be finally set to earliest.

Modifications

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

The tests will be run in the forked repository until all PR review comments have
been handled, the tests pass and the PR is approved by a reviewer.
-->

@dao-jun dao-jun added type/bug The PR fixed a bug or issue reported a bug area/broker labels Feb 2, 2025
@dao-jun dao-jun self-assigned this Feb 2, 2025
@dao-jun dao-jun requested a review from lhotari February 2, 2025 15:19
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 2, 2025
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating the issue and providing a fix, @dao-jun . I wonder if there would be a way to add a failing test case which would then prevent future regressions?

@lhotari
Copy link
Member

lhotari commented Feb 2, 2025

/pulsarbot rerun-failure-checks

@lhotari
Copy link
Member

lhotari commented Feb 2, 2025

Checkstyle errors:

[INFO] There are 2 errors reported by Checkstyle 10.14.2 with /home/runner/work/pulsar/pulsar/buildtools/src/main/resources/pulsar/checkstyle.xml ruleset.
Error:  src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageFinder.java:[25,1] (imports) ImportOrder: Extra separation in import group before 'com.google.common.annotations.VisibleForTesting'
Error:  src/main/java/org/apache/pulsar/broker/service/persistent/PersistentMessageFinder.java:[25,1] (imports) ImportOrder: Import com.google.common.annotations.VisibleForTesting appears after other imports that it should precede

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.22%. Comparing base (bbc6224) to head (10ae74e).
Report is 878 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23919      +/-   ##
============================================
+ Coverage     73.57%   74.22%   +0.65%     
+ Complexity    32624    32240     -384     
============================================
  Files          1877     1853      -24     
  Lines        139502   143725    +4223     
  Branches      15299    16334    +1035     
============================================
+ Hits         102638   106686    +4048     
+ Misses        28908    28639     -269     
- Partials       7956     8400     +444     
Flag Coverage Δ
inttests 26.77% <ø> (+2.18%) ⬆️
systests 23.22% <ø> (-1.10%) ⬇️
unittests 73.74% <ø> (+0.89%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...er/service/persistent/PersistentMessageFinder.java 75.71% <ø> (+9.80%) ⬆️

... and 1037 files with indirect coverage changes

@lhotari
Copy link
Member

lhotari commented Feb 4, 2025

@dao-jun Do you have thoughts about adding a failing test case that would prevent future regressions? It seems that we don't currently have proper tests that would test the seeking behavior at the Pulsar client level.

@dao-jun
Copy link
Member Author

dao-jun commented Feb 4, 2025

@dao-jun Do you have thoughts about adding a failing test case that would prevent future regressions? It seems that we don't currently have proper tests that would test the seeking behavior at the Pulsar client level.

I modified some tests in PersistentMessageFinderTest, it should cover this case. For Pulsar client level seeking tests, maybe I need to add some

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker doc-not-needed Your PR changes do not impact docs type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Seek on a timestamp causes the subscription to be resetted to the earliest position.
3 participants