Replies: 4 comments 3 replies
-
Related: #10174 (comment) |
Beta Was this translation helpful? Give feedback.
-
Related, a comment I and at least one other core team member still stand by: #10174 (comment) Dynamically adjusting global QoS is a bad design. There likely will be little interest in changing how it works because any change will lead to unexpected behavior changes for someone, and the whole feature, when actively changed over the course of a connection's lifetime, is too prone to causing confusing one way or another. |
Beta Was this translation helpful? Give feedback.
-
@thedrow I'm afraid what you are describing is a Celery-specific design decision, and (from a RabbitMQ developer perspective at least) a somewhat questionable one, but now you want the small RabbitMQ core team to spend time fixing that design decision. Honestly, this does not sound like a reasonable justification to me. There are so many other areas of improvement in 4.x that are necessary, changing how QoS works (and it works that way for a reason) is just not a priority. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem? Please describe.
Recently Celery added support for Quroum queues and we discovered that the way it works breaks Celery in unexpected ways.
Celery relies on dynamically setting the QoS for multiple features such as autoscaling, ETA/Countdown, retries, and more.
When we receive a task with ETA we increase the prefetch count so that another task will be executed while we wait for the task with the ETA to reach the time in which it should be executed.
After the ETA is reached, we execute the task and decrease the prefetch count back to normal.
When global QoS is set to false, the task with an ETA blocks Celery from fetching another task and can essentially block the worker from executing any other task if multiple ETA tasks are received when we reach the maximum concurrency of tasks that are allowed to be executed.
ETAs can be arbitrarily long so the worker may be blocked for an arbitrary period of time.
If we have a concurrency of 4 and we receive 4 tasks with an ETA that is an hour in the future, we block the worker entirely for an hour from executing any other task. This is unacceptable.
Retries also rely on the ETA feature to retry tasks after a certain period of time.
Autoscaling relies on increasing and decreasing the QoS when a new process is instantiated and destroyed.
When the global QoS is set to false, Celery will instantiate a new process when the workload of the queue increases but won't pull new tasks for those processes, making the entire feature broken as well.
Describe the solution you'd like
I'd like RabbitMQ to support dynamically increasing and decreasing the QoS when global QoS is set to false so that Celery won't be broken when used with quorum queues.
Describe alternatives you've considered
We're trying to re-architect the ETA feature but so far without any luck. Autoscaling has no other solution.
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions