Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optional persistent registry lookup cache and better 429 handling #13

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

tianon
Copy link
Member

@tianon tianon commented Jan 11, 2024

This adds a --cache flag to builds which will read from a cache file and write back out to it a set of images that are ~safe to cache (and because it's just a file, we can clear cache entries trivially by deleting the relevant entries from the file).

This also adds better 429 behavior that's twofold:

  1. cap our maximum registry requests at ~500/min with an initial burst of 100 unless we hit a 429 (which should, in theory, keep us closer to staying under the Hub abuse rate limits)

  2. actually retry requests when they give us a 429 response (capping both new requests and retries at the same ~500/min which is ~8/sec)

As a final bonus, I added/generated an appropriate cache file for our local test suite which brings the total number of actual Hub/registry requests our test suite makes down to one single image lookup.

…ndling

This adds a `--cache` flag to builds which will read from a cache file and write back out to it a set of images that are ~safe to cache (and because it's just a file, we can clear cache entries trivially by deleting the relevant entries from the file).

This also adds better 429 behavior that's twofold:

1. cap our maximum registry requests at ~500/min with an initial burst of 100 unless we hit a 429 (which should, in theory, keep us closer to staying under the Hub abuse rate limits)

2. actually retry requests when they give us a 429 response (capping both new requests and retries at the same ~500/min which is ~8/sec)

As a final bonus, I added/generated an appropriate cache file for our local test suite which brings the total number of actual Hub/registry requests our test suite makes down to one single image lookup.
@tianon tianon requested a review from yosifkit as a code owner January 11, 2024 23:12
@yosifkit yosifkit merged commit c16b47a into docker-library:main Jan 12, 2024
1 check passed
@yosifkit yosifkit deleted the 429 branch January 12, 2024 00:03
)

var concurrency = 50
var concurrency = 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with the rate limiter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was implemented initially as a poor-man's rate limiter, so now that we have a "real" rate limiter in place, we could technically remove this completely and let it be completely unbounded, but I figured making it high (instead of unlimited) was probably better/safer.

In practice the rate limiter is likely to block many of these goroutines anyhow (and having them wait to make a request vs wait to start seems totally fair since the overhead is really low and they can potentially do other things in the mean time). Also, with the cache implemented, many of the goroutines won't block at all, so it's just "free" concurrency. 👍

saveCacheMutex.Lock()
defer saveCacheMutex.Unlock()

if saveCache == nil || cacheFile == "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be moved before locking the mutex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cacheFile test could, but the test for saveCache cannot because we cannot safely compare that value without a mutex (even a simple comparison like nil).

(This is also effectively the test for "did the user specify --cache", and we're specifying --cache in all the places we care about now, so this test is effectively as good as a no-op now.)

rArches := cache.Arches

if !wasCached {
fmt.Fprintf(os.Stderr, "NOTE: lookup %s -> %s\n", img, r.Desc.Digest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a debugging statement or indented to remain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of both -- I added it for debugging, but realized it was useful enough to know when lookups actually happen that I left it in. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants