Skip to content

[Bug] Consumer might not be able to consume messages after failover to a new cluster #25291

@BewareMyPower

Description

@BewareMyPower

Search before reporting

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

User environment

master branch

Issue Description

With AutoClusterFailover, a consumer is able to switch from cluster A to cluster B on the same topic.

pulsarClient.updateServiceUrl(target);
pulsarClient.reloadLookUp();

The client will update the service URL and recreate the lookup service. However, the consumer's internal states are not cleared, including:

  • The ACK grouping tracker, which contains some acknowledged message IDs
  • lastDequedMessageId, which represents the message ID of last message received

These message IDs all come from Cluster A.

Next time, when the consumer established connection to Cluster B, it will:

  • Filter out duplicated messages via AcknowledgmentsGroupingTracker#isDuplicate. If the acknowledged message ID is cumulative, it will compare the newly received message's ID from Cluster B with the previous acknowledged message ID from Cluster A
  • Set startMessageId to the last message ID from Cluster A, this message ID will be carried in the Subscribe request to the topic on Cluster B

Error messages

n/a

Reproducing the issue

I did't try to reproduce it with Java client for now, but it can be reproduced with C++ client like: apache/pulsar-client-cpp@c6de067#diff-36936d31d0cbc6547ff0eea6b2bc79bbffdc4de6d63223dd0b4e527e369059d1

Additional information

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

type/bugThe PR fixed a bug or issue reported a bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions