-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement retries for spurious connection issues #758
Conversation
Reviewing now! Thanks for this @bennybp! |
retry looks kinda dead; looks like backoff may be a better bet for avoiding bitrot? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, and suggest making the parameters for retries configurable at runtime. I also think that retry
as a library looks a bit dead, so may be a better bet to use something else, like backoff
. Can probably still use backoff
's decorator directly with configurable paramters as e.g.:
backoff.on_predicate(backoff.expo, ...)(func_to_retry)(*args, **kwargs)
If adding another dependency seems overkill for this, it's not too hard to implement your own solution with the behavior you want. As an example, we do this in a simple way in alchemiscale here.
qcportal/qcportal/client_base.py
Outdated
tries=5, | ||
delay=0.5, | ||
max_delay=5, | ||
backoff=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding these, perhaps use attributes on PortalClientBase
configurable on init
? Probably really only need to expose tries
, max_delay
, and backoff
, but could expose all of these if you want to allow for customizability at runtime.
Also, probably need to set jitter
as well to avoid many workers retrying all in-sync during a network hiccup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly opposed to adding these to the constructor, since a vast, vast majority of people would use the defaults, and this would almost double the number of arguments available. I can do something similar to the encoding
though and just allow those attributes to be set for a client object.
Let me think about just hardcoding a solution. We are only using it in one place, so it it's not be too bad to implement it myself (using the aforementioned class attributes) |
Ok implemented by hand now. It's pretty simple :) There is only one other place I would consider adding this kind of logic (when dealing with certain database operations). But it will end up looking a lot different, so I'm not worried about duplicating code. |
Giving this another review today! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @bennybp! This should help users quite a bit!
Description
Requests that result in certain errors due to spurious connection/networking issues should now be automatically retried.
The types of exceptions handled should cover most cases, but I am open to expanding the list if people find ones that I missed.
The request will be tried again after waiting 0.5, 1.0, 2.0, and 4.0 seconds (since the previous request)
Seem reasonable @dotsdl? Closes #741
Changelog description
Requests now will be automatically retried in case of connection or networking issues
Status