-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetcher retries #1153
Fetcher retries #1153
Conversation
metadata_fetcher/fetchers/Fetcher.py
Outdated
http = requests.Session() | ||
retry_strategy = Retry( | ||
total=3, | ||
status_forcelist=[413, 429, 500, 502, 503, 504], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove the back off factor - that seems useful and relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if this makes sense as a global configuration - like in rikolti/utils.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I didn't actually remove the backoff factor as there wasn't one configured at all here previously. I figured that we could go with the default backoff factor of 0 and tweak if necessary. But sure, it probably makes sense to set it to 2
.
Yeah, we could make this a global configuration. This is actually a copy of configure_http_session()
from the content harvester code (which doesn't have a backoff factor set).
session = requests.Session() | ||
retries = Retry(total=3, backoff_factor=2) | ||
session.mount("https://", HTTPAdapter(max_retries=retries)) | ||
response = session.get(url=url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you added the backoff_factor to the gloabl configuration for self.http_session
, then you should be able to do:
response = session.get(url=url) | |
response = self.http_session.get(url=url) |
And get rid of lines 93-95 here.
Sure, self.http_session
adds some status codes to the forcelist and attached the retry strategy to both http and https, but that seems fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about that too. I figured to introduce the least amount of change possible, but you're right, it probably would be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm, that makes sense - what you did change (Fetcher.fetch_page()
) is the function that is called like 95% of the time though, so it seemed confusing to me to change it most of the way but not all of the way. Could result in weird errors on strange edge cases where we've had to implement additional requests outside of the standard requests managed by the Fetcher base class.
Looks like Oh, yeah and a global search for Retry across the codebase is actually pretty demonstrative - the Retry configuration is the same across several mappers and a fetcher ( Since it does look like we use the same configuration everywhere (with the exception of this one discrepancy on backoff_factor and status_forcelist), I think adding it to a |
@amywieliczka Sure yeah, I think it makes sense to do this. I can do the work to put this in place. I don't think it'll take too long--I was just trying to do this quickly so that I could focus on the Nuxeo API issue. |
Figured I could do this quickly while @barbarahui's head was in Nuxeo API issues. I did leave us using the default requests session for all requests to our own Registry API, as well as requests to the OpenSearch API in the record_indexer. |
@amywieliczka this looks great, thank you so much!!!! |
make_http_request()
functionality intoucd_json_fetcher
-- this is the only place this was being used