QQ: checkpointing frequency improvements #11964

kjnilsson · 2024-08-09T08:37:58Z

The current approach takes too many checkpoints which affects performance negatively, especially with large backlogs.

This PR takes an approach more similar to what was done for release cursors in 3.13.x.

Also add a force_checkpoint aux command that the purge operation
emits - this can also be used to try to force a checkpoint

The checkpointing config can be changed by setting the the quorum_queue_checkpoint_config persistent term:

persistent_term:set(quorum_queue_checkpoint_config, {MinIntervalMs, MinIndexes, MaxIndexes}).

the current values are: {1000, 4096, 666667} which means it will take a checkpoint at most every 1s as long as at least 4096 indexes have been applied. The min indexes between each checkpoint will grow in line with the message backlog up to at most 666667.

michaelklishin · 2024-08-14T18:56:39Z

The forced push was a rebase.

michaelklishin · 2024-08-15T02:49:41Z

My PerfTest tests do not observe any anomalies. With an 88M (22M per queue) message backlog, there are 22-23 checkpoints per queue. When the queues are drained, the checkpoints go away roughly at the rate of consumption of 1M messages.

I assume that a checkpoint taken every ≈ 1M messages is a reasonable rate.

With a 50M message backlog across 4 queues, the node takes 18s to start on a mostly idle 10 core machine with a reasonably fast 3 year old SSD.

With a workload that simulates peak throughput with 4 queues, 3 publishers and 3 consumers,
and monitors queue directories using watch -n 1 (once a second), a checkpoint appears and disappears roughly every 1M messages published (≈ 5s).

kjnilsson · 2024-08-15T08:13:10Z

I assume that a checkpoint taken every ≈ 1M messages is a reasonable rate.

depending on how many messages there are in the backlog it will grow the number of indexes between checkpoints from 4096 to ~1M (max) so yes that tallys. Cheers.

it was in 3.13.x. Also add a force_checkpoint aux command that the purge operation emits - this can also be used to try to force a checkpoint

Also remove a resolved TODO about conversion for the `last_checkpoint` field.

QQ: checkpointing frequency improvements (backport #11964)

kjnilsson requested a review from mkuratczyk August 9, 2024 08:38

kjnilsson added this to the 4.0.0 milestone Aug 9, 2024

michaelklishin changed the title ~~Qq: adjust checkpointing algo to something more like~~ QQ: adjust checkpointing algo to something more like it was in 3.13.x Aug 9, 2024

kjnilsson force-pushed the qq-checkpointing-tweaks branch from 3dc0791 to b17f444 Compare August 9, 2024 14:40

kjnilsson changed the title ~~QQ: adjust checkpointing algo to something more like it was in 3.13.x~~ QQ: checkpointing frequency improvements Aug 9, 2024

michaelklishin added the backport-v4.0.x label Aug 13, 2024

kjnilsson force-pushed the qq-checkpointing-tweaks branch 2 times, most recently from d513239 to 776d8cb Compare August 14, 2024 16:07

mergify bot added the bazel label Aug 14, 2024

michaelklishin force-pushed the qq-checkpointing-tweaks branch from 776d8cb to e22d3c8 Compare August 14, 2024 18:56

kjnilsson force-pushed the qq-checkpointing-tweaks branch from e22d3c8 to f9aa5ac Compare August 15, 2024 08:13

Qq: adjust checkpointing algo to something more like

0f1f27c

it was in 3.13.x. Also add a force_checkpoint aux command that the purge operation emits - this can also be used to try to force a checkpoint

kjnilsson force-pushed the qq-checkpointing-tweaks branch from f9aa5ac to 0f1f27c Compare August 15, 2024 10:55

kjnilsson marked this pull request as ready for review August 15, 2024 10:59

kjnilsson requested review from the-mikedavis and removed request for mkuratczyk August 15, 2024 10:59

Remove max_in_memory_length/bytes from QQ config type

9ca77f8

Also remove a resolved TODO about conversion for the `last_checkpoint` field.

the-mikedavis approved these changes Aug 15, 2024

View reviewed changes

michaelklishin merged commit 178f9a9 into main Aug 16, 2024
238 checks passed

michaelklishin deleted the qq-checkpointing-tweaks branch August 16, 2024 00:49

mergify bot mentioned this pull request Aug 16, 2024

QQ: checkpointing frequency improvements (backport #11964) #12023

Merged

michaelklishin added a commit that referenced this pull request Aug 16, 2024

Merge pull request #12023 from rabbitmq/mergify/bp/v4.0.x/pr-11964

c6aaa50

QQ: checkpointing frequency improvements (backport #11964)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QQ: checkpointing frequency improvements #11964

QQ: checkpointing frequency improvements #11964

kjnilsson commented Aug 9, 2024 •

edited

Loading

michaelklishin commented Aug 14, 2024

michaelklishin commented Aug 15, 2024

kjnilsson commented Aug 15, 2024

QQ: checkpointing frequency improvements #11964

QQ: checkpointing frequency improvements #11964

Conversation

kjnilsson commented Aug 9, 2024 • edited Loading

michaelklishin commented Aug 14, 2024

michaelklishin commented Aug 15, 2024

kjnilsson commented Aug 15, 2024

kjnilsson commented Aug 9, 2024 •

edited

Loading