Allow dynamically setting QoS when global QoS is set to false #11955

thedrow · 2024-08-08T10:57:16Z

thedrow
Aug 8, 2024

Is your feature request related to a problem? Please describe.

Recently Celery added support for Quroum queues and we discovered that the way it works breaks Celery in unexpected ways.
Celery relies on dynamically setting the QoS for multiple features such as autoscaling, ETA/Countdown, retries, and more.

When we receive a task with ETA we increase the prefetch count so that another task will be executed while we wait for the task with the ETA to reach the time in which it should be executed.
After the ETA is reached, we execute the task and decrease the prefetch count back to normal.

When global QoS is set to false, the task with an ETA blocks Celery from fetching another task and can essentially block the worker from executing any other task if multiple ETA tasks are received when we reach the maximum concurrency of tasks that are allowed to be executed.
ETAs can be arbitrarily long so the worker may be blocked for an arbitrary period of time.
If we have a concurrency of 4 and we receive 4 tasks with an ETA that is an hour in the future, we block the worker entirely for an hour from executing any other task. This is unacceptable.

Retries also rely on the ETA feature to retry tasks after a certain period of time.

Autoscaling relies on increasing and decreasing the QoS when a new process is instantiated and destroyed.
When the global QoS is set to false, Celery will instantiate a new process when the workload of the queue increases but won't pull new tasks for those processes, making the entire feature broken as well.

Describe the solution you'd like

I'd like RabbitMQ to support dynamically increasing and decreasing the QoS when global QoS is set to false so that Celery won't be broken when used with quorum queues.

Describe alternatives you've considered

We're trying to re-architect the ETA feature but so far without any luck. Autoscaling has no other solution.

Additional context

No response

ansd · 2024-08-08T12:56:08Z

ansd
Aug 8, 2024
Maintainer

Related: #10174 (comment)

0 replies

michaelklishin · 2024-08-08T14:51:49Z

michaelklishin
Aug 8, 2024
Maintainer

Related, a comment I and at least one other core team member still stand by: #10174 (comment)

Dynamically adjusting global QoS is a bad design. There likely will be little interest in changing how it works because any change will lead to unexpected behavior changes for someone, and the whole feature, when actively changed over the course of a connection's lifetime, is too prone to causing confusing one way or another.

2 replies

thedrow Aug 9, 2024
Author

But that means that Celery has to give up autoscaling entirely when RabbitMQ deprecates global QoS.
We need an alternative.

michaelklishin Aug 9, 2024
Maintainer

@thedrow the suggested alternative is to use AMQP 1.0 once RabbitMQ 4.0 ships. In AMQP 1.0, link credits allow you to do exactly what you want. Starting with RabbitMQ 4.0, AMQP 1.0 is a core protocol (built-in, no plugin needed, although one is still technically provided to simply upgrades).

There are no plans to remove global QoS or modify AMQP 0-9-1 further unless absolutely necessary. We may consider tweaking QoS again but there's certainly some opposition on the team.

michaelklishin · 2024-08-08T14:53:54Z

michaelklishin
Aug 8, 2024
Maintainer

@thedrow I'm afraid what you are describing is a Celery-specific design decision, and (from a RabbitMQ developer perspective at least) a somewhat questionable one, but now you want the small RabbitMQ core team to spend time fixing that design decision. Honestly, this does not sound like a reasonable justification to me.

There are so many other areas of improvement in 4.x that are necessary, changing how QoS works (and it works that way for a reason) is just not a priority.

0 replies

ansd · 2024-08-16T14:59:43Z

ansd
Aug 16, 2024
Maintainer

@thedrow

Why can't Celery simply add more consumers when the client wants to check out more messages?
Can the client use basic.get to fetch more messages (and coordinate locally that max N tasks run in parallel)?
If a task is to be processed in 1 hour from now, can the client enqueue or dead letter the message somewhere and pick it up later again?
Are there any missing features in RabbitMQ that prevents Celery from using AMQP 1.0 where this feature is implemented?

1 reply

michaelklishin Aug 16, 2024
Maintainer

I'd never recommend the use of basic.get beyond basic integration tests. It always ends with polling with all of its downsides and almost none of the upsides of messaging :(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow dynamically setting QoS when global QoS is set to false #11955

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Allow dynamically setting QoS when global QoS is set to false #11955

thedrow Aug 8, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Replies: 4 comments · 3 replies

ansd Aug 8, 2024 Maintainer

michaelklishin Aug 8, 2024 Maintainer

thedrow Aug 9, 2024 Author

michaelklishin Aug 9, 2024 Maintainer

michaelklishin Aug 8, 2024 Maintainer

ansd Aug 16, 2024 Maintainer

michaelklishin Aug 16, 2024 Maintainer

thedrow
Aug 8, 2024

Replies: 4 comments 3 replies

ansd
Aug 8, 2024
Maintainer

michaelklishin
Aug 8, 2024
Maintainer

thedrow Aug 9, 2024
Author

michaelklishin Aug 9, 2024
Maintainer

michaelklishin
Aug 8, 2024
Maintainer

ansd
Aug 16, 2024
Maintainer

michaelklishin Aug 16, 2024
Maintainer