-
-
Notifications
You must be signed in to change notification settings - Fork 945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to Kombu 5.5.0 stopped processing SQS messages #2258
Comments
I'm seeing the same thing. It's not completely broken, but processing is extremely slow. |
Can you please share a reproduction script or steps to reproduce, I'd like to solve this ASAP. Thank you. |
Our workers process tasks from four standard SQS queues. Task execution time is fast (~50ms) and we use |
Ok, found it. Throughput: 0.99 tasks per second with
|
the performance regressions are hard to address in short notice to be honest |
I think we have to stick to special implementation of pycurl until we have our homegrown faster or equal alternative of pycurl. some previous attempts also faced the serious performance regressions |
If we can't figure out by next 1 or 2 weeks, we should revert this #2261 |
Unfortunately we don't have the luxury of time my friend. I will take care of everything and release a fixed version on Monday. I just want to give it a few days for @spawn-guy to check this out, but per my script above we can confirm the issue is only due to the dep changes. |
I released it before the weekend knowing it might have issues. I am prepared to respond quickly. I take responsibility, please don't stress yourself over the weekend 🙏 |
Any recommendation? |
there is no need to feel guilty. celery code base is hard. |
I have to test alternatives, right now I think we should revert back to pycurl as it was working fine as far as I know. |
the main difference between pycurl and urllib3 was is using the available @mdrachuk what is your use-case? does this pr-revert fix your problem? @Nusnus have you tried tuning the i am running my kombu fork (not the rc versions) and things seem to be processed fine, but not at the speeds mentioned |
another idea i have is about the SSL-certificates and optional |
here are my "from-home" on Windows tests of the code here #2258 (comment)
max_clients=10
FORKED_BY_MULTIPROCESSING - didn't do anything kombu = "*"
max_clients: int = 100
max_clients: int = 1
max_clients: int = 10
max_clients: int = 1
i dunno. is it Windows? is it
all speeds seem to be the same. with or without urllib3/pycurl on both kombu versions: latest and 5.4.2 help! |
aws access keys are picked up from a default location. no extra env vars are set. |
May be we should highlight these suggestions you shared for people to avoid these performance issues |
@auvipy i'd like to hear more use-case specifics from @mdrachuk and @mgorven : environments, os'es, python versions. with-or-without a proxy (as i didn't test that one out) i am running not-high-frequency tasks(1+ seconds) on aws elastic beanstalk py3.11 instances with the urllib3 version of kombu and another reason of slowdown with maybe it's like, i can agree with slowdown, maybe, some configuration issue or aws outage or WAF interference on seeing a new |
as a side-note: i am also thinking about |
we will revisit our recent and old experiments for introducing native async support in v6.0. but we have to reach to a consensus for this exact issue for now with recent changes |
we can also try httpx and see later |
@spawn-guy We're using Python 3.11.11 on Debian Bookworm/12 aarch64. No proxy for SQS. |
// On Topic We will revert back to pycurl and release v5.6. // Off Topic
I am trying to be as creative as I’ve ever been in my life to solve the challenges of migrating to asyncio. It became a mission for me. It requires solving a completely different core challenge first, which makes the difficulty extremely high and multi-dimensional, but this only makes it more attractive tbh 😉 EDIT: |
i started to have an idea to give an option to pick the client implementation with a config setting. because i am not rolling back :) i don't want to re-complie curl for so, shall we instead of reverting things back - add an client-selection option now? and keep both clients and make the pycurl the default with urllib3 as an option and possible other http client implementations later (aiohttp and httpx)? |
Very interesting. I like your attitude :) EDIT: |
That's too late for celery v5.5. As it is clear that new change has introduced performance regression with default mechanisms. So It's best to revert now and reconsider for 5.7 |
I might also need to apply whatever solution you’d bring to pytest-celery, as pycurl was also removed from there in this effort. |
That’s my reasoning too. Enabling both by choice with pycurl as default can be an acceptable middle ground. WDYT? |
Yes |
sure. and will still try to reproduce the slowdown today/tomorrow. no luck on windows so far. |
Would it be recommended to use 5.4.2 on Python 3.13 to avoid this issue, given that 5.5 is the first version of Kombu to officially support Python 3.13? If not, would it be possible to get a 5.4.3 release with support for Python 3.13 until 5.5+ has stabilized? |
it is not so clear to me anymore. first: @Nusnus how many times have you tried your test? second: "our" urllib3 client seems to be only used to FETCH and SEND messages. on i've deployed things to amazon linux 2023 and now running more iterations of the same test from y home windows(cloud-home delay).
testing the speed of urllib3.
the numbers fluctuate too much from and 500 tasks seem to work faster than 50
the numbers fluctuate too much from |
As I have pycurl deployment problems again - I am thinking about the fixing strategy. I will roll back the deletion of pycurl and dependencies. The best way would be to introduce choice via celery configure, like pool choice. But I don't have enough time to do this. But I need some advice on package dependency : we now use sqs extra, that required pycurl. And I have pycurl problems on instance deployment.
The ci will use pycurl by default What do you say? |
initial code here #2269 when i've picked the 3rd route to requirements management: |
so i've deployed the "fallback urllib3" version to my @mgorven will you be so kind to test the branch with |
i have been thinking and commenting with @jmsmkn and.. the thing is.. i can't reliably state that pycurl-urllib3 is the problem here. 'cause my change was not the only one since 5,4,2. on the other hand i have also enabled ssl connections for sqs which were disabled instead of "fixed" previously. it looks like i still need to test pycurl on aws anyway :( and also if ssl works correctly |
The branch doesn't work at all: https://gist.github.com/mgorven/f1689323acb1a4e981644dfc9afe87ab This is using rev 77ca118 and pycurl 7.45.6. |
@mgorven thanks for the feedback i'll look into it today. even though this is the code form |
Hello.
Tbh I'm confused about the debug steps for this, because we didn't have any alerts or error logs. Only the actual usage showed that the tasks from the SQS weren't processing anymore.
Downgrade to 5.4.2 solved the issue for now.
Leaving this here for anybody having similar issues to see and maybe comment on details they can find.
The text was updated successfully, but these errors were encountered: