Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker, BlobInfoCache: try to reuse compressed blobs from all known compressed digests #1461

Closed
wants to merge 1 commit into from

Conversation

flouthoc
Copy link
Contributor

@flouthoc flouthoc commented Feb 9, 2022

It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

  • Move all known blobs into common data scope i.e all instead of
    storing them in seperate registry.

  • Increase the sample set of potential blob reuse to all known
    compressed digests , also involving the one which do not belong to
    current registry.

  • If a blob is found match it against the registry where we are
    attempting to push. If blob is already there consider it a CACHE HIT! and reply skipping blob, since its already there.

How to verify this ?

Pre-steps

  • Remove all images buildah rmi --all // needed so all new pulled blobs can
    be tagged again in common bucket
  • Remove any previous blob-info-cache by
rm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb

Actual use-case

$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test

Expected Output

Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures

…essed digests

It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* Move all known blobs into common data scope i.e `all` instead of
  storing them in seperate registry.

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

* Remove all images `buildah rmi --all` // needed so all new blobs can
  be tagged again in common bucket
* Remove any previous `blob-info-cache` by

```console
rm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb
```

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Signed-off-by: Aditya R <arajan@redhat.com>
@flouthoc
Copy link
Contributor Author

flouthoc commented Feb 9, 2022

@mtrmac @nalind @vrothberg PTAL. Following PR is in draft since i did not add any tests but will add tests after initial review and discussion.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

This is way too big and coarse a hammer, and was rejected previously ( #550 ).

The generic-code BlobInfoCache should natively have the concept of “trying a digest without a known location” (making BICReplacementCandidate2.Location optional?), and the prioritize logic should understand and explicitly manage that.

return types.BICTransportScope{Opaque: reference.Domain(ref.ref)}
// Blobs can be reused across multiple registries
// therefore bucket all blobs in same scope
return types.BICTransportScope{Opaque: "all"}
Copy link
Collaborator

@mtrmac mtrmac Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loses most of the nuance/intelligence. We do want CandidateLocations2 to primarily try to use blobs that are known to exist on that registry, not just recently-used digests — only if we have no candidates at that location, blindly try the compressed version (and only a few such digests — compare prioritize.replacementAttempts).

Copy link
Contributor Author

@flouthoc flouthoc Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue is that we will perform lookup twice one iteration for known blobs via CandidateLocations2 and another if we create a new function to perform lookup on all available blobs. So there is a overlap of while searching for blobs of course we could minimize that by some checks.

So i agree we can split the matching of all away from CandidateLocations2 so original logic remains intact. Let me try something around this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CandidateLocations2 is an internal interface, we can reshape it any way we like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac just re-visiting this since BZ showed interest in this RFE again, do you like the idea if we modify CandidateLocations2 to return all blobs but give these newly added blobs will have least priority so maybe some modifications in DestructivelyPrioritizeReplacementCandidates.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I agree with the lowest priority part for “location-unknown” candidates. (We should also limit their number.)
  • Opaque does mean Opaque to the generic code: that field is exclusively up to the transport to define and the generic cache code has no business checking for specific string values. In the returned BICReplacementCandidate2, introduce a new way to represent “location unknown” values (an extra bool or turn Location into a pointer).

#550 (review) is an old description how I imagine this could work — now I don’t think we need a separate CandidateDigests, just let CandidateLocations2 include the location-unknown results. But that’s just a guess and the structure might end up different based on the final code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtrmac Got it i have a patch ready, do you prefer if i open a new PR cause i think it might have different review comments altogether and close this PR ? But i can also force push on this PR if that is preferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don’t have any preference WRT continuing in this PR vs. filing another one.

if crossRegistry {
// found following blob in different registry but we need to check blob presence against the registry
// where we are planning to push, hence switch back the candidate repo to the one where we are planning to push
candidateRepo, _ = parseBICLocationReference(types.BICLocationReference{Opaque: string(d.ref.ref.Name())})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This could just be candidateRepo = d.ref.ref.Name() — and it should have happened instead of the crossRegistry = true assignment already, so that the “Already tried the primary destination” check applies to the cross-registry situation as well.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ackd.

@mtrmac mtrmac changed the title docker, blobCache: try to reuse compressed blobs from all known compressed digests docker, BlobInfoCache: try to reuse compressed blobs from all known compressed digests Feb 9, 2022
@mtrmac
Copy link
Collaborator

mtrmac commented Feb 9, 2022

FWIW we differentiate between “blob info cache” (c/image/internal/blobinfocache, metadata-only, always used) and “blob cache” (buildah/pkg/blobcache, includes blob data, strictly opt-in).

flouthoc added a commit to flouthoc/image that referenced this pull request Sep 1, 2022
…oss registries

It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

How to verify this ?

* Remove all images `buildah rmi --all` // needed so all new blobs can
  be tagged again in common bucket
* Remove any previous `blob-info-cache` by

```console
rm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb
```

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Alternative to: containers#1461

Signed-off-by: Aditya R <arajan@redhat.com>
flouthoc added a commit to flouthoc/image that referenced this pull request Sep 1, 2022
…oss registries

It seems we try to reuse blobs only for the specified registry, however
we can have valid known compressed digests across registry as well
following pr attempts to use that by doing following steps.

* `CandidateLocations2` now processes all known blobs and appends them
  to returned candidates at the lowest priority. As a result when
`TryReusingBlob` tries to process these candidates and if the blobs
filtered by the `Opaque` set by the `transport` fail to match then
attempt is made against all known blobs (ones which do not belong to the
current registry).

* Increase the sample set of potential blob reuse to all known
  compressed digests , also involving the one which do not belong to
current registry.

* If a blob is found match it against the registry where we are
  attempting to push. If blob is already there consider it a `CACHE
HIT!` and reply skipping blob, since its already there.

How to verify this ?

* Remove all images `buildah rmi --all` // needed so all new blobs can
  be tagged again in common bucket
* Remove any previous `blob-info-cache` by

```console
rm /home/<user>/.local/share/containers/cache/blob-info-cache-v1.boltdb
```

```console
$ skopeo copy docker://registry.fedoraproject.org/fedora-minimal docker://quay.io/fl/test:some-tag
$ buildah pull registry.fedoraproject.org/fedora-minimal
$ buildah tag registry.fedoraproject.org/fedora-minimal quay.io/fl/test
$ buildah push quay.io/fl/test
```

```console
Getting image source signatures
Copying blob a3497ca15bbf skipped: already exists
Copying config f7e02de757 done
Writing manifest to image destination
Storing signatures
```

Alternative to: containers#1461

Signed-off-by: Aditya R <arajan@redhat.com>
@flouthoc
Copy link
Contributor Author

flouthoc commented Sep 1, 2022

Closing this in favor of #1645

@flouthoc flouthoc closed this Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants