-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker, BlobInfoCache: try to reuse compressed blobs from all known
compressed digests
#1461
Conversation
…essed digests It seems we try to reuse blobs only for the specified registry, however we can have valid known compressed digests across registry as well following pr attempts to use that by doing following steps. * Move all known blobs into common data scope i.e `all` instead of storing them in seperate registry. * Increase the sample set of potential blob reuse to all known compressed digests , also involving the one which do not belong to current registry. * If a blob is found match it against the registry where we are attempting to push. If blob is already there consider it a `CACHE HIT!` and reply skipping blob, since its already there. * Remove all images `buildah rmi --all` // needed so all new blobs can be tagged again in common bucket * Remove any previous `blob-info-cache` by ```console rm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb ``` ```console $ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag $ buildah pull registry.fedoraproject.org/fedora-minimal $ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test $ buildah push quay.io/fl/test ``` ```console Getting image source signatures Copying blob a3497ca15bbf skipped: already exists Copying config f7e02de757 done Writing manifest to image destination Storing signatures ``` Signed-off-by: Aditya R <arajan@redhat.com>
@mtrmac @nalind @vrothberg PTAL. Following PR is in draft since i did not add any tests but will add tests after initial review and discussion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
This is way too big and coarse a hammer, and was rejected previously ( #550 ).
The generic-code BlobInfoCache should natively have the concept of “trying a digest without a known location” (making BICReplacementCandidate2.Location
optional?), and the prioritize
logic should understand and explicitly manage that.
return types.BICTransportScope{Opaque: reference.Domain(ref.ref)} | ||
// Blobs can be reused across multiple registries | ||
// therefore bucket all blobs in same scope | ||
return types.BICTransportScope{Opaque: "all"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loses most of the nuance/intelligence. We do want CandidateLocations2
to primarily try to use blobs that are known to exist on that registry, not just recently-used digests — only if we have no candidates at that location, blindly try the compressed version (and only a few such digests — compare prioritize.replacementAttempts
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue is that we will perform lookup twice one iteration for known
blobs via CandidateLocations2
and another if we create a new function to perform lookup on all available blobs
. So there is a overlap of while searching for blobs
of course we could minimize that by some checks.
So i agree we can split the matching of all
away from CandidateLocations2
so original logic remains intact. Let me try something around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CandidateLocations2
is an internal interface, we can reshape it any way we like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtrmac just re-visiting this since BZ showed interest in this RFE again, do you like the idea if we modify CandidateLocations2
to return all
blobs but give these newly added blobs will have least priority so maybe some modifications in DestructivelyPrioritizeReplacementCandidates
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I agree with the lowest priority part for “location-unknown” candidates. (We should also limit their number.)
Opaque
does meanOpaque
to the generic code: that field is exclusively up to the transport to define and the generic cache code has no business checking for specific string values. In the returnedBICReplacementCandidate2
, introduce a new way to represent “location unknown” values (an extra bool or turnLocation
into a pointer).
#550 (review) is an old description how I imagine this could work — now I don’t think we need a separate CandidateDigests
, just let CandidateLocations2
include the location-unknown results. But that’s just a guess and the structure might end up different based on the final code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtrmac Got it i have a patch ready, do you prefer if i open a new PR cause i think it might have different review comments altogether and close this PR ? But i can also force push on this PR if that is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don’t have any preference WRT continuing in this PR vs. filing another one.
if crossRegistry { | ||
// found following blob in different registry but we need to check blob presence against the registry | ||
// where we are planning to push, hence switch back the candidate repo to the one where we are planning to push | ||
candidateRepo, _ = parseBICLocationReference(types.BICLocationReference{Opaque: string(d.ref.ref.Name())}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This could just be candidateRepo = d.ref.ref.Name()
— and it should have happened instead of the crossRegistry = true
assignment already, so that the “Already tried the primary destination” check applies to the cross-registry situation as well.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ackd.
all known
compressed digestsall known
compressed digests
FWIW we differentiate between “blob info cache” ( |
…oss registries It seems we try to reuse blobs only for the specified registry, however we can have valid known compressed digests across registry as well following pr attempts to use that by doing following steps. * `CandidateLocations2` now processes all known blobs and appends them to returned candidates at the lowest priority. As a result when `TryReusingBlob` tries to process these candidates and if the blobs filtered by the `Opaque` set by the `transport` fail to match then attempt is made against all known blobs (ones which do not belong to the current registry). * Increase the sample set of potential blob reuse to all known compressed digests , also involving the one which do not belong to current registry. * If a blob is found match it against the registry where we are attempting to push. If blob is already there consider it a `CACHE HIT!` and reply skipping blob, since its already there. How to verify this ? * Remove all images `buildah rmi --all` // needed so all new blobs can be tagged again in common bucket * Remove any previous `blob-info-cache` by ```console rm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb ``` ```console $ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag $ buildah pull registry.fedoraproject.org/fedora-minimal $ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test $ buildah push quay.io/fl/test ``` ```console Getting image source signatures Copying blob a3497ca15bbf skipped: already exists Copying config f7e02de757 done Writing manifest to image destination Storing signatures ``` Alternative to: containers#1461 Signed-off-by: Aditya R <arajan@redhat.com>
…oss registries It seems we try to reuse blobs only for the specified registry, however we can have valid known compressed digests across registry as well following pr attempts to use that by doing following steps. * `CandidateLocations2` now processes all known blobs and appends them to returned candidates at the lowest priority. As a result when `TryReusingBlob` tries to process these candidates and if the blobs filtered by the `Opaque` set by the `transport` fail to match then attempt is made against all known blobs (ones which do not belong to the current registry). * Increase the sample set of potential blob reuse to all known compressed digests , also involving the one which do not belong to current registry. * If a blob is found match it against the registry where we are attempting to push. If blob is already there consider it a `CACHE HIT!` and reply skipping blob, since its already there. How to verify this ? * Remove all images `buildah rmi --all` // needed so all new blobs can be tagged again in common bucket * Remove any previous `blob-info-cache` by ```console rm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb ``` ```console $ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag $ buildah pull registry.fedoraproject.org/fedora-minimal $ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test $ buildah push quay.io/fl/test ``` ```console Getting image source signatures Copying blob a3497ca15bbf skipped: already exists Copying config f7e02de757 done Writing manifest to image destination Storing signatures ``` Alternative to: containers#1461 Signed-off-by: Aditya R <arajan@redhat.com>
Closing this in favor of #1645 |
It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.
Move all known blobs into common data scope i.e
all
instead ofstoring them in seperate registry.
Increase the sample set of potential blob reuse to all known
compressed digests , also involving the one which do not belong to
current registry.
If a blob is found match it against the registry where we are
attempting to push. If blob is already there consider it a
CACHE HIT!
and reply skipping blob, since its already there.How to verify this ?
Pre-steps
buildah rmi --all
// needed so all new pulled blobs canbe tagged again in common bucket
blob-info-cache
byrm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb
Actual use-case
Expected Output