release-5.32 Zstd backports #2507

mtrmac · 2024-08-07T16:27:30Z

This is #2321 + #2503 + #2487 , backported to a newly-created release-5.32 branch, along with a release bump to 5.32.1.

Alternatively, we could just tag main as a 5.32.1 without backporting: see https://github.com/containers/image/compare/main..mtrmac:5.32-zstd , the difference is a set of typo fixes and dependency updates. The dependency updates are probably unwanted, I’m not sure.

Cc: @TomSweeneyRedHat

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

We are already calling m.LayerInfos() anyway, so there is ~no extra cost. And using LayerInfos means we don't need to worry about reversing the order of layers, and we will have access to the layer index, allowing us to acccess the indexTo* fields in the future. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

- Don't claim that we only use compressed digests. - Explicitly document that we assume TOC digests to be unambiguous - Actually define the term "DiffID". - Be more precise in computeID about the criteria being layer identity, not where we pull the layer from. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Some errors are severe enough that just logging and continuing is not really worthwhile. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

…tyDataLocked Currrently we "only" have indexToTOCDigest and blobDiffIDs, but we will make this more complex. Centralizing the consumption of these fields into trustedLayerIdentityDataLocked ensure that all consumers interpret the data exactly consistently (and it also allows us to use a single "trusted" variable instead of 2/3 individual ones). Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

The new code is not called, so it should not change behavior (apart from extending the BoltDB/SQLite schema). Signed-off-by: Miloslav Trmač <mitr@redhat.com>

…storage by DiffID If we can, prefer identifying layers by DiffID, because multiple TOCs can map to the same DiffID; and because it maximizes reuse with non-TOC layers. For now, the new situation is unreachable. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

We will add one more instance of this, so share the code. Should not change behavior (it does remove one unreachable code path). Signed-off-by: Miloslav Trmač <mitr@redhat.com>

… is known - Multiple TOC values might correspond to a single DiffID (e.g. if different compression levels are used); try to share them all, identified by DiffID (so that we also reuse with non-TOC pulls). - LayersByTOCDigest only uses a single TOC digest per layer; BlobInfoCache allows multiple matches, matches layers which have been since deleted, and potentially matches TOC digests which we have created by pushing but haven't pulled yet. - On reuse, we can now use DiffID-based layer identities even if the reuse was TOC~driven. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

…yers Signed-off-by: Miloslav Trmač <mitr@redhat.com>

…ayers - Rely on it instead of triggering the "untrusted DiffID" logic - Also propagate it to storage Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

…er match The rules expect us to set manifest editing updates. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... and add CandidateWithLocation and CandidateWithUnknownLocation , so that the BIC implementations only need to deal with one value instead of carrying around three; we will want to add one more, and encapsulating them all into a single template will make it transparent to the cache implementations. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... just because we now can, and to nudge all future caches to be designed around CandidateTemplateWithCompression. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

We will add more logic to the default case, so sharing the CandidateCompressionMatchesReuseConditions call is not going to be as easy. Split the two code paths. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

We will want to record more than a single alghoritm name. For now, just introduce the structure and modify users, we'll add the new fields later. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

… blobs ... because we don't trust the TOC data, if any. This allows us to remove the zstd:chunked hack; we, at least, now record those blobs as zstd. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

If we don't know an uncompressed digest, don't try using "". Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

The cache implementations are recording both the base and specific compression variant; CandidateLocations2 all call CandidateTemplateWithCompression to choose the appropriate variants to return based on CandidateLocations2Options. This way, neither the BIC implementations nor the transports are not responsible for converting zstd:chunked entries to zstd entries if the user wants the latter. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Introduce distinct uploadedCompressorBaseVariantName and uploadedCompressorSpecificVariantName fields; that way we now never call RecordDigestCompressorData with inconsistent zstd / zstd:chunked in one field, so we can always record data when we see, or create, a zstd:chunked layer, removing the current hack. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

- Add a CompressionAnnotations field - Allow turning a known-zstd blob into a zstd:chunked one if we know the right annotations This just adds the fields, nothing sets them yet, should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

- Return the required annotations, if we have them - If we have a zstd blob and the BIC contains the annotations, we don't check for the blob's presence initially. In that case, don't skip it if we find the BIC annotations. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... instead of only treating it as zstd. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac · 2024-08-07T16:29:02Z

Wrong target, I’ll re-create the PR.

mtrmac added 27 commits August 7, 2024 18:16

Bump to 5.32.1-dev

5af61e0

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Allow returning (and reporting) unexpected errors from computeID

59f1890

Some errors are severe enough that just logging and continuing is not really worthwhile. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Add TOC digest <-> uncompressed digest mapping to BIC

3788220

The new code is not called, so it should not change behavior (apart from extending the BoltDB/SQLite schema). Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Split reusedBlobFromLayerLookup from tryReusingBlobAsPending

2c22da0

We will add one more instance of this, so share the code. Should not change behavior (it does remove one unreachable code path). Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Record the (TOC digest, uncompressed digest) data when we compress la…

0e79045

…yers Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Use the uncompressed digest we got from a BlobInfoCache for chunked l…

874ad0e

…ayers - Rely on it instead of triggering the "untrusted DiffID" logic - Also propagate it to storage Signed-off-by: Miloslav Trmač <mitr@redhat.com>

HACK: Don't compress with zstd:chunked when encrypting

072a576

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Fix data returned when returning uncompressed data on a c/storage lay…

7292f9a

…er match The rules expect us to set manifest editing updates. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Make the fields of CandidateWithTime private

d816552

... just because we now can, and to nudge all future caches to be designed around CandidateTemplateWithCompression. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Reformat CandidateTemplateWithCompression a bit

b36f716

We will add more logic to the default case, so sharing the CandidateCompressionMatchesReuseConditions call is not going to be as easy. Split the two code paths. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Introduce blobinfocache.DigestCompressorData

1ff3519

We will want to record more than a single alghoritm name. For now, just introduce the structure and modify users, we'll add the new fields later. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Always record only the base variant information about consumed source…

577b535

… blobs ... because we don't trust the TOC data, if any. This allows us to remove the zstd:chunked hack; we, at least, now record those blobs as zstd. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Unrelated: Fix a bug in SQLite BlobInfoCache

4df6647

If we don't know an uncompressed digest, don't try using "". Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Fix a comment

f4b5e90

Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Detect zstd:chunked format in source blobs

2fff234

... instead of only treating it as zstd. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Release 5.32.1

a8aa8c4

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Bump to 5.32.2-dev

dda9562

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac closed this Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-5.32 Zstd backports #2507

release-5.32 Zstd backports #2507

mtrmac commented Aug 7, 2024

mtrmac commented Aug 7, 2024

release-5.32 Zstd backports #2507

release-5.32 Zstd backports #2507

Conversation

mtrmac commented Aug 7, 2024

mtrmac commented Aug 7, 2024