-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement optional persistent registry lookup cache and better 429 handling #13
Conversation
…ndling This adds a `--cache` flag to builds which will read from a cache file and write back out to it a set of images that are ~safe to cache (and because it's just a file, we can clear cache entries trivially by deleting the relevant entries from the file). This also adds better 429 behavior that's twofold: 1. cap our maximum registry requests at ~500/min with an initial burst of 100 unless we hit a 429 (which should, in theory, keep us closer to staying under the Hub abuse rate limits) 2. actually retry requests when they give us a 429 response (capping both new requests and retries at the same ~500/min which is ~8/sec) As a final bonus, I added/generated an appropriate cache file for our local test suite which brings the total number of actual Hub/registry requests our test suite makes down to one single image lookup.
) | ||
|
||
var concurrency = 50 | ||
var concurrency = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with the rate limiter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was implemented initially as a poor-man's rate limiter, so now that we have a "real" rate limiter in place, we could technically remove this completely and let it be completely unbounded, but I figured making it high (instead of unlimited) was probably better/safer.
In practice the rate limiter is likely to block many of these goroutines anyhow (and having them wait to make a request vs wait to start seems totally fair since the overhead is really low and they can potentially do other things in the mean time). Also, with the cache implemented, many of the goroutines won't block at all, so it's just "free" concurrency. 👍
saveCacheMutex.Lock() | ||
defer saveCacheMutex.Unlock() | ||
|
||
if saveCache == nil || cacheFile == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be moved before locking the mutex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cacheFile
test could, but the test for saveCache
cannot because we cannot safely compare that value without a mutex (even a simple comparison like nil
).
(This is also effectively the test for "did the user specify --cache
", and we're specifying --cache
in all the places we care about now, so this test is effectively as good as a no-op now.)
rArches := cache.Arches | ||
|
||
if !wasCached { | ||
fmt.Fprintf(os.Stderr, "NOTE: lookup %s -> %s\n", img, r.Desc.Digest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a debugging statement or indented to remain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit of both -- I added it for debugging, but realized it was useful enough to know when lookups actually happen that I left it in. 😅
This adds a
--cache
flag to builds which will read from a cache file and write back out to it a set of images that are ~safe to cache (and because it's just a file, we can clear cache entries trivially by deleting the relevant entries from the file).This also adds better 429 behavior that's twofold:
cap our maximum registry requests at ~500/min with an initial burst of 100 unless we hit a 429 (which should, in theory, keep us closer to staying under the Hub abuse rate limits)
actually retry requests when they give us a 429 response (capping both new requests and retries at the same ~500/min which is ~8/sec)
As a final bonus, I added/generated an appropriate cache file for our local test suite which brings the total number of actual Hub/registry requests our test suite makes down to one single image lookup.