-
Couldn't load subscription status.
- Fork 3.7k
[improve][client] Add independent multiTopicsSinglePartitionReceiverQueueSize config for single partition consumer in multi-topics consumer to reduce memory consumption #24867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…eueSize config for single consumer in MultiTopicsConsumerImpl to reduce memory consumption
|
If this PR is ok, I'll add unit tests to cover the cde change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There no independent
receiverQueueSizeconfig for single partition consumer inMultiTopicsConsumerImplandPatternMultiTopicsConsumerImpl. AlthoughmaxTotalReceiverQueueSizeAcrossPartitionsconfig can limit in-memory messages of a single topic with multi-partitions, but it can't limit in-memory messages of multi-topics.For example, if we subscribe to a regex pattern that matches 1000 non-partitioned topics. Before this PR, each non-partitioned topic consumer's
receiverQueueSizeis 1000(ConsumerImpluses the samereceiverQueueSizevalue asPatternMultiTopicsConsumerImpl), the max messages in memory is 1000 + 1000 * 1000 = 1001000. Let's ignore the insignificant number 1000, if each message size is 8Kb, then we need 1000000 * 8Kb = 7,812.5MB memory to boot our application in catch-up read situation, which is unnecessary.
This is a valid problem. However, in this case, since it requires public API changes, it's better to start a discussion on the dev mailing list to ask for feedback from other Pulsar community members. Changing the public API will require a PIP eventually. Before creating a PIP, it's better to ask feedback on ways to address this problem and what approach would others recommend. Adding more options to the API isn't great from usability perspective since it causes more overhead in understanding the usage. It would be better if the multi-topics consumer could automatically tune itself to handle the situation. That might have been the purpose of PIP-74 receiver queue autoscaling (PR 14494), so it's worth checking that out too.
Are you already using PIP-74 memory limiter (enabled by default) and receiver queue autoscaling?
|
Not tried autoscaling yet. I noticed that we can auto scale receiverQueueSize using memory limiter by turning on Every single partition consumer starts with currentReceiverQueueSize flow action, meaning that the consumer will receive currentReceiverQueueSize messages. pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java Lines 236 to 249 in e656065
pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java Lines 1216 to 1231 in 678db6b
If pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java Lines 220 to 226 in 678db6b
pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java Lines 502 to 509 in 678db6b
Every single partition consumer's pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java Lines 250 to 259 in 678db6b
This PR can tune single partition consumer's receiverQueueSize and batch receive maxNumMessages using multiTopicsSinglePartitionReceiverQueueSize config, which may help to solve this problem. Above is just pure code analysis, not tested by myself yet. And more, I think multi-topics consumer and it's inner single partition consumers just like parent container and child containers relation, and their receiverQueueSize should be tuned independently in someway. |
Motivation
There no independent
receiverQueueSizeconfig for single partition consumer inMultiTopicsConsumerImplandPatternMultiTopicsConsumerImpl. AlthoughmaxTotalReceiverQueueSizeAcrossPartitionsconfig can limit in-memory messages of a single topic with multi-partitions, but it can't limit in-memory messages of multi-topics.For example, if we subscribe to a regex pattern that matches 1000 non-partitioned topics. Before this PR, each non-partitioned topic consumer's
receiverQueueSizeis 1000(ConsumerImpluses the samereceiverQueueSizevalue asPatternMultiTopicsConsumerImpl), the max messages in memory is 1000 + 1000 * 1000 = 1001000. Let's ignore the insignificant number 1000, if each message size is 8Kb, then we need 1000000 * 8Kb = 7,812.5MB memory to boot our application in catch-up read situation, which is unnecessary.pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java
Lines 236 to 249 in e656065
For backward compatibility, add
multiTopicsSinglePartitionReceiverQueueSizeEnableswitch, the default value is set to false.Modifications
Add independent
multiTopicsSinglePartitionReceiverQueueSizeconfig for single consumer inMultiTopicsConsumerImplto reduce memory consumption.Verifying this change
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: oneby-wang#7