[improve][client] Add independent multiTopicsSinglePartitionReceiverQueueSize config for single partition consumer in multi-topics consumer to reduce memory consumption #24867

oneby-wang · 2025-10-17T13:40:57Z

Motivation

There no independent receiverQueueSize config for single partition consumer in MultiTopicsConsumerImpl and PatternMultiTopicsConsumerImpl. Although maxTotalReceiverQueueSizeAcrossPartitions config can limit in-memory messages of a single topic with multi-partitions, but it can't limit in-memory messages of multi-topics.

For example, if we subscribe to a regex pattern that matches 1000 non-partitioned topics. Before this PR, each non-partitioned topic consumer's receiverQueueSize is 1000(ConsumerImpl uses the same receiverQueueSize value as PatternMultiTopicsConsumerImpl), the max messages in memory is 1000 + 1000 * 1000 = 1001000. Let's ignore the insignificant number 1000, if each message size is 8Kb, then we need 1000000 * 8Kb = 7,812.5MB memory to boot our application in catch-up read situation, which is unnecessary.

pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java

Lines 236 to 249 in e656065

    
           private void startReceivingMessages(List<ConsumerImpl<T>> newConsumers) { 
        
               if (log.isDebugEnabled()) { 
        
                   log.debug("[{}] startReceivingMessages for {} new consumers in topics consumer, state: {}", 
        
                       topic, newConsumers.size(), getState()); 
        
               } 
        
               if (getState() == State.Ready) { 
        
                   newConsumers.forEach(consumer -> { 
        
                       consumer.increaseAvailablePermits(consumer.getConnectionHandler().cnx(), 
        
                               consumer.getCurrentReceiverQueueSize()); 
        
                       internalPinnedExecutor.execute(() -> receiveMessageFromConsumer(consumer, true)); 
        
                   }); 
        
               } 
        
           }

For backward compatibility, add multiTopicsSinglePartitionReceiverQueueSizeEnable switch, the default value is set to false.

Modifications

Add independent multiTopicsSinglePartitionReceiverQueueSize config for single consumer in MultiTopicsConsumerImpl to reduce memory consumption.

Verifying this change

Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: oneby-wang#7

…eueSize config for single consumer in MultiTopicsConsumerImpl to reduce memory consumption

…ed switch

oneby-wang · 2025-10-17T13:42:14Z

If this PR is ok, I'll add unit tests to cover the cde change.

lhotari

There no independent receiverQueueSize config for single partition consumer in MultiTopicsConsumerImpl and PatternMultiTopicsConsumerImpl. Although maxTotalReceiverQueueSizeAcrossPartitions config can limit in-memory messages of a single topic with multi-partitions, but it can't limit in-memory messages of multi-topics.

For example, if we subscribe to a regex pattern that matches 1000 non-partitioned topics. Before this PR, each non-partitioned topic consumer's receiverQueueSize is 1000(ConsumerImpl uses the same receiverQueueSize value as PatternMultiTopicsConsumerImpl), the max messages in memory is 1000 + 1000 * 1000 = 1001000. Let's ignore the insignificant number 1000, if each message size is 8Kb, then we need 1000000 * 8Kb = 7,812.5MB memory to boot our application in catch-up read situation, which is unnecessary.

This is a valid problem. However, in this case, since it requires public API changes, it's better to start a discussion on the dev mailing list to ask for feedback from other Pulsar community members. Changing the public API will require a PIP eventually. Before creating a PIP, it's better to ask feedback on ways to address this problem and what approach would others recommend. Adding more options to the API isn't great from usability perspective since it causes more overhead in understanding the usage. It would be better if the multi-topics consumer could automatically tune itself to handle the situation. That might have been the purpose of PIP-74 receiver queue autoscaling (PR 14494), so it's worth checking that out too.

Are you already using PIP-74 memory limiter (enabled by default) and receiver queue autoscaling?

oneby-wang · 2025-10-18T06:00:17Z

Not tried autoscaling yet. I noticed that we can auto scale receiverQueueSize using memory limiter by turning on autoScaledReceiverQueueSizeEnabled switch, but after some code analysis, I think autoscaling may probably not work in multi-topics situation.

Every single partition consumer starts with currentReceiverQueueSize flow action, meaning that the consumer will receive currentReceiverQueueSize messages.

pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java

Lines 236 to 249 in e656065

    
           private void startReceivingMessages(List<ConsumerImpl<T>> newConsumers) { 
        
               if (log.isDebugEnabled()) { 
        
                   log.debug("[{}] startReceivingMessages for {} new consumers in topics consumer, state: {}", 
        
                       topic, newConsumers.size(), getState()); 
        
               } 
        
               if (getState() == State.Ready) { 
        
                   newConsumers.forEach(consumer -> { 
        
                       consumer.increaseAvailablePermits(consumer.getConnectionHandler().cnx(), 
        
                               consumer.getCurrentReceiverQueueSize()); 
        
                       internalPinnedExecutor.execute(() -> receiveMessageFromConsumer(consumer, true)); 
        
                   }); 
        
               } 
        
           }

MultiTopicsConsumerImpl creates single partition consumer using there own receiverQueueSize and set batch receive maxNumMessages to (receiverQueueSize / 2).

pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java

Lines 1216 to 1231 in 678db6b

    
           private ConsumerImpl<T> createInternalConsumer(ConsumerConfigurationData<T> configurationData, String partitionName, 
        
                                                          int partitionIndex, CompletableFuture<Consumer<T>> subFuture, 
        
                                                          boolean createIfDoesNotExist, Schema<T> schema) { 
        
               BatchReceivePolicy internalBatchReceivePolicy = BatchReceivePolicy.builder() 
        
                       .maxNumMessages(Math.max(configurationData.getReceiverQueueSize() / 2, 1)) 
        
                       .maxNumBytes(-1) 
        
                       .timeout(1, TimeUnit.MILLISECONDS) 
        
                       .build(); 
        
               configurationData.setBatchReceivePolicy(internalBatchReceivePolicy); 
        
               configurationData = configurationData.clone(); 
        
               return ConsumerImpl.newConsumerImpl(client, partitionName, 
        
                       configurationData, client.externalExecutorProvider(), 
        
                       partitionIndex, true, listener != null, subFuture, 
        
                       startMessageId, schema, this.internalConsumerInterceptors, 
        
                       createIfDoesNotExist, startMessageRollbackDurationInSec); 
        
           }

If autoScaledReceiverQueueSizeEnabled switch is on, we set currentReceiverQueueSize to minReceiverQueueSize().

pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java

Lines 220 to 226 in 678db6b

    
           public void initReceiverQueueSize() { 
        
               if (conf.isAutoScaledReceiverQueueSizeEnabled()) { 
        
                   CURRENT_RECEIVER_QUEUE_SIZE_UPDATER.set(this, minReceiverQueueSize()); 
        
               } else { 
        
                   CURRENT_RECEIVER_QUEUE_SIZE_UPDATER.set(this, maxReceiverQueueSize); 
        
               } 
        
           }

minReceiverQueueSize() is (2 * batchReceivePolicy.getMaxNumMessages() - 2≈ receiverQueueSize) according to above analysis. So, the application boots with about (n * receiverQueueSize) messages, n is the topic nums.

pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerImpl.java

Lines 502 to 509 in 678db6b

    
           public int minReceiverQueueSize() { 
        
               int size = Math.min(INITIAL_RECEIVER_QUEUE_SIZE, maxReceiverQueueSize); 
        
               if (batchReceivePolicy.getMaxNumMessages() > 0) { 
        
                   // consumerImpl may store (half-1) permits locally. 
        
                   size = Math.max(size, 2 * batchReceivePolicy.getMaxNumMessages() - 2); 
        
               } 
        
               return size; 
        
           }

Every single partition consumer's minReceiverQueueSize() is about receiverQueueSize due to batch receive maxNumMessages is (receiverQueueSize / 2), so it can't be reduced below (2 * batchReceivePolicy.getMaxNumMessages() - 2 ≈ receiverQueueSize).

pulsar/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBase.java

Lines 250 to 259 in 678db6b

    
           protected void reduceCurrentReceiverQueueSize() { 
        
               if (!conf.isAutoScaledReceiverQueueSizeEnabled()) { 
        
                   return; 
        
               } 
        
               int oldSize = getCurrentReceiverQueueSize(); 
        
               int newSize = Math.max(minReceiverQueueSize(), oldSize / 2); 
        
               if (oldSize > newSize) { 
        
                   setCurrentReceiverQueueSize(newSize); 
        
               } 
        
           }

This PR can tune single partition consumer's receiverQueueSize and batch receive maxNumMessages using multiTopicsSinglePartitionReceiverQueueSize config, which may help to solve this problem.

Above is just pure code analysis, not tested by myself yet.

And more, I think multi-topics consumer and it's inner single partition consumers just like parent container and child containers relation, and their receiverQueueSize should be tuned independently in someway.

oneby-wang added 5 commits October 17, 2025 13:57

[improve][client] Add independent multiTopicsSingleConsumerReceiverQu…

d088585

…eueSize config for single consumer in MultiTopicsConsumerImpl to reduce memory consumption

[improve][client] Add multiTopicsSingleConsumerReceiverQueueSizeEnabl…

754e7ab

…ed switch

[improve][client] Add method in ConsumerBuilder and ReaderBuilder

5f33993

[improve][client] Handle partitioned-topic onTopicsExtended event

bacc6b8

[improve][client] Modify ConsumerBuilder java doc

b567962

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 17, 2025

lhotari requested changes Oct 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[improve][client] Add independent multiTopicsSinglePartitionReceiverQueueSize config for single partition consumer in multi-topics consumer to reduce memory consumption #24867

[improve][client] Add independent multiTopicsSinglePartitionReceiverQueueSize config for single partition consumer in multi-topics consumer to reduce memory consumption #24867

Uh oh!

oneby-wang commented Oct 17, 2025

Uh oh!

oneby-wang commented Oct 17, 2025

Uh oh!

lhotari left a comment •

edited

Loading

Uh oh!

oneby-wang commented Oct 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	private void startReceivingMessages(List<ConsumerImpl<T>> newConsumers) {
	if (log.isDebugEnabled()) {
	log.debug("[{}] startReceivingMessages for {} new consumers in topics consumer, state: {}",
	topic, newConsumers.size(), getState());
	}

	if (getState() == State.Ready) {
	newConsumers.forEach(consumer -> {
	consumer.increaseAvailablePermits(consumer.getConnectionHandler().cnx(),
	consumer.getCurrentReceiverQueueSize());
	internalPinnedExecutor.execute(() -> receiveMessageFromConsumer(consumer, true));
	});
	}
	}

Uh oh!

[improve][client] Add independent multiTopicsSinglePartitionReceiverQueueSize config for single partition consumer in multi-topics consumer to reduce memory consumption #24867

Are you sure you want to change the base?

[improve][client] Add independent multiTopicsSinglePartitionReceiverQueueSize config for single partition consumer in multi-topics consumer to reduce memory consumption #24867

Uh oh!

Conversation

oneby-wang commented Oct 17, 2025

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Matching PR in forked repository

Uh oh!

oneby-wang commented Oct 17, 2025

Uh oh!

lhotari left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oneby-wang commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lhotari left a comment •

edited

Loading

oneby-wang commented Oct 18, 2025 •

edited

Loading