Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker, BlobInfoCache: try to reuse compressed blobs from all known compressed digests #1461

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docker/cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@ import (

// bicTransportScope returns a BICTransportScope appropriate for ref.
func bicTransportScope(ref dockerReference) types.BICTransportScope {
// Blobs can be reused across the whole registry.
return types.BICTransportScope{Opaque: reference.Domain(ref.ref)}
// Blobs can be reused across multiple registries
// therefore bucket all blobs in same scope
return types.BICTransportScope{Opaque: "all"}
Copy link
Collaborator

@mtrmac mtrmac Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loses most of the nuance/intelligence. We do want CandidateLocations2 to primarily try to use blobs that are known to exist on that registry, not just recently-used digests — only if we have no candidates at that location, blindly try the compressed version (and only a few such digests — compare prioritize.replacementAttempts).

Copy link
Contributor Author

@flouthoc flouthoc Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue is that we will perform lookup twice one iteration for known blobs via CandidateLocations2 and another if we create a new function to perform lookup on all available blobs. So there is a overlap of while searching for blobs of course we could minimize that by some checks.

So i agree we can split the matching of all away from CandidateLocations2 so original logic remains intact. Let me try something around this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CandidateLocations2 is an internal interface, we can reshape it any way we like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac just re-visiting this since BZ showed interest in this RFE again, do you like the idea if we modify CandidateLocations2 to return all blobs but give these newly added blobs will have least priority so maybe some modifications in DestructivelyPrioritizeReplacementCandidates.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I agree with the lowest priority part for “location-unknown” candidates. (We should also limit their number.)
  • Opaque does mean Opaque to the generic code: that field is exclusively up to the transport to define and the generic cache code has no business checking for specific string values. In the returned BICReplacementCandidate2, introduce a new way to represent “location unknown” values (an extra bool or turn Location into a pointer).

#550 (review) is an old description how I imagine this could work — now I don’t think we need a separate CandidateDigests, just let CandidateLocations2 include the location-unknown results. But that’s just a guess and the structure might end up different based on the final code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac Got it i have a patch ready, do you prefer if i open a new PR cause i think it might have different review comments altogether and close this PR ? But i can also force push on this PR if that is preferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don’t have any preference WRT continuing in this PR vs. filing another one.

}

// newBICLocationReference returns a BICLocationReference appropriate for ref.
Expand Down
11 changes: 10 additions & 1 deletion docker/docker_image_dest.go
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,7 @@ func (d *dockerImageDestination) TryReusingBlob(ctx context.Context, info types.
bic := blobinfocache.FromBlobInfoCache(cache)
candidates := bic.CandidateLocations2(d.ref.Transport(), bicTransportScope(d.ref), info.Digest, canSubstitute)
for _, candidate := range candidates {
crossRegistry := false
candidateRepo, err := parseBICLocationReference(candidate.Location)
if err != nil {
logrus.Debugf("Error parsing BlobInfoCache location reference: %s", err)
Expand All @@ -350,7 +351,8 @@ func (d *dockerImageDestination) TryReusingBlob(ctx context.Context, info types.
// Sanity checks:
if reference.Domain(candidateRepo) != reference.Domain(d.ref.ref) {
logrus.Debugf("... Internal error: domain %s does not match destination %s", reference.Domain(candidateRepo), reference.Domain(d.ref.ref))
continue
// most likely this blob is on a different registry
crossRegistry = true
}
if candidateRepo.Name() == d.ref.ref.Name() && candidate.Digest == info.Digest {
logrus.Debug("... Already tried the primary destination")
Expand All @@ -361,6 +363,13 @@ func (d *dockerImageDestination) TryReusingBlob(ctx context.Context, info types.

// Checking candidateRepo, and mounting from it, requires an
// expanded token scope.

if crossRegistry {
// found following blob in different registry but we need to check blob presence against the registry
// where we are planning to push, hence switch back the candidate repo to the one where we are planning to push
candidateRepo, _ = parseBICLocationReference(types.BICLocationReference{Opaque: string(d.ref.ref.Name())})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This could just be candidateRepo = d.ref.ref.Name() — and it should have happened instead of the crossRegistry = true assignment already, so that the “Already tried the primary destination” check applies to the cross-registry situation as well.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ackd.

}

extraScope := &authScope{
remoteName: reference.Path(candidateRepo),
actions: "pull",
Expand Down