-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Better LLM retry behavior #6557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RateLimitError, | ||
ServiceUnavailableError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
503 is a transitory error, we could probably keep it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It's transitory but also unexpected...
I'm open to it but I lean towards telling the user their LLM is flaking out rather than OpenHands looking like it's slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda agree with you actually. We always had a problem in understanding our retry settings, because it's a bit weird to figure out a sensible default for "unexpected stuff happened".
And now we do allow the user to continue normally after reporting the error.
eval is the exception, I'd love to hear from Xingyao on that.
There are some issues on litellm on this, the exceptions as defined are mixing permanent and transitory exceptions from the provider. We have some weird code due to that. I would agree that cleaning them and start again is reasonable. 😅 |
Small related detail, there's a try/except due to retries in |
Please see also a small follow-up here: |
Thanks @enyst! Any lingering issues here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great if @xingyaoww can take a look, because it's possible that the removed exceptions are relevant in evals.
Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this now to unblock stuff -- I'll probably make some of these handling specific for evaluation when I run into them :)
Fixes All-Hands-AI#6942 Removed in All-Hands-AI#6557
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
This part wasn't correct, unfortunately:
Tenacity counts attempts, not retries, so you actually get 3 retries after the first attempt fails. That said, it's not 5 + 10 + 20 = 35s either, because Tenacity uses binary exponential backoff. The actual total wait time is 18s: 5 + 5 + 8. If you add that to the waits introduced by LiteLLM attempts, it becomes roughly 24s, less than a minute, and that's a problem, because it's not enough time for per-minute rate limiting blocks to reset. This could explain some (at least 3) weird open rate-limiting OH issues where the agent keeps stopping: there's not enough time for the per-minute limit to reset. I suggest you change the values again to have the waits span over 60s. Ideally, to cover all cases, one of the waits should be over 60s, because some providers also count failed attempts. |
End-user friendly description of the problem this fixes or functionality that this introduces
no changelog
Give a summary of what the PR does, explaining any non-trivial design decisions
The LLM is retrying a lot of unrecoverable exceptions, which makes it look like the app is just stuck.
The current configuration also waits a total of 11 minutes (!) for a good response, not including the request time, which can add ~5-8 minutes to that total. So the app looks VERY stuck.
We could potentially move this into a config if these errors are common enough that eval needs them. CC @xingyaoww
Link of any specific issues this addresses
To run this PR locally, use the following command: