-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rawhide+podman-next] podman push/pull registry roundtrip changes the image #20611
[rawhide+podman-next] podman push/pull registry roundtrip changes the image #20611
Comments
@giuseppe PTAL |
@vrothberg @mtrmac PTAL Image created via pulling a gzip.tar is different then a zstd:chunked, is this expected? |
…sion Upstream issue: containers/podman#20611 Known issue cockpit-project#5515
Upstream issue: containers/podman#20611 Known issue cockpit-project#5515
Let's ignore that particular failure in the test runs for the time being. It just creates a firehose of failures and notifications, which is annoying and hides potential further regressions. Is that zstd change is expected to modify images on a push/pull roundtrip? It feels rather unexpected to me, so far the image hash has been a good and reliable indicator of integrity and a precise name. But if that is intended, I can drop that assumption from c-podman's tests. But as I can't technically judge this, I'll wait for your investigation. Thanks! |
This also started to fail in rawhide proper, so it's not limited to the podman-next repo any more. containers-common 1-98 landed in rawhide. |
Upstream issue: containers/podman#20611 Known issue #5515
Note that the @martinpitt, I think the cockpit test could fail at any point in the future independent of zstd or gzip. |
Until very recently, a cycle of `podman push`/`pull` preserved an image's ID. Starting with containers-common 1-98 this is not the case any more, as that changed the on-disk compression format. Quoting podman devs: "The digests validate a specific image _representation_, not some abstract "sameness" of an image." Instead, remove the original busybox image for the user as well (that already happend for the system user), and more explicitly check deletion of an image with multiple tags. Fixes containers/podman#20611 Obsoletes cockpit-project/bots#5515
Ack, thanks @vrothberg ! I sent cockpit-project/cockpit-podman#1477 to drop that assumption then, I just wanted to get a confirmation that this is expected behaviour. |
Until very recently, a cycle of `podman push`/`pull` preserved an image's ID. Starting with containers-common 1-98 this is not the case any more, as that changed the on-disk compression format. Quoting podman devs: "The digests validate a specific image _representation_, not some abstract "sameness" of an image." Instead, remove the original busybox image for the user as well (that already happend for the system user), and more explicitly check deletion of an image with multiple tags. Fixes containers/podman#20611 Obsoletes cockpit-project/bots#5515
Uh, this report looks confusing. digests are expected to change (at any push, but they will change on a push with a different compression format). Image IDs are not expected to change (… yet; we might actually do that on a Zstd pull, somewhere around containers/image#1980 ). If the situation is that the image ID (and a416a98b71e224a31ee99cff8e16063554498227d2b696152a9c3e0aa65e5824 is an image ID, not a manifest digest; digests of that image are 3fbc632167424a6d997e74f52b878d7cc478225cffac6bc977eedfe51c7f4e79 at the top level, and for the amd64/linux platform instance) changes, that is indeed surprising and worth investigation. Re-opening to either try to reproduce this myself, or to get a positive confirmation that this is actually about digests. In the future, in “steps to reproduce”: … “the downloaded image is different from the original:”, please actually include the command writing that output. |
@mtrmac Note that until yesterday I didn't even know what a "manifest digest" was. The "image ID" is what's visible with I didn't compare digests (I wouldn't even know how to display them without looking for that in the manpages or googling). It's very likely that the digests didn't change, but due to the gzip → zstd recompression the image ID changed. But short of changing the whole concept (i.e. making the image ID equal to the manifest digest or whatnot) this may be unavoidable. (Again: this is my gut feeling, I have no technical qualification here). I did write the command that wrote the output:
but the form splits the "what did you do" from "what is supposed to happen". That's maybe a bit confusing in this case. Thanks for investigating! |
isn't it expected that all the digests change when we change compression? On fedora:rawhide we are defaulting to |
Until something like #1980 chooses different layer IDs layers pulled the traditional way and layers pulled based on the TOC, OCI image IDs = config digest. So, just pushing/pulling with zstd is not currently expected to change the image ID. |
The original alpine image is in Docker format, while we convert to oci with zstd:chunked. I get the same result just with |
Oh. Yes, that would explain it. |
can we close the issue or is there anything more you'd like to check? |
Let's not close this please. The recent |
Here's the shortest reproducer I can manage. There may be better ones. Assumes: checked-out and built # iii=quay.io/libpod/testimage:20221018
# rrr=quay.io/libpod/registry:2.8
# bin/podman pull $rrr $iii
# mkdir /tmp/pmpm
# htpasswd -Bbn uuuu pppp >/tmp/pmpm/htpasswd
# bin/podman run -d -p 127.0.0.1:5000:5000 --name reg -v /tmp/pmpm:/auth:Z -e REGISTRY_AUTH=htpasswd -e REGISTRY_AUTH_HTPASSWD_REALM="Registry Realm" -e REGISTRY_AUTH_HTPASSWD_PATH=/auth/htpasswd $rrr
95dfcf3be45a4441b0f2cef3e6531607a92b1e75ed687717cc62c86f6a782356
# bin/podman login --tls-verify=false localhost:5000
Username: uuuu
Password: pppp
Login Succeeded!
# bin/podman push -q --tls-verify=false $iii localhost:5000/foo:bar
# rm -f /tmp/foo.tar; bin/podman image save -q -o /tmp/foo.tar $iii So far, so good. Now the problem comes when pulling back that pushed image: # bin/podman pull --tls-verify=false localhost:5000/foo:bar
Trying to pull localhost:5000/foo:bar...
Getting image source signatures
Copying blob 4ffca960952d skipped: already exists
Copying blob 16254220078a skipped: already exists
Copying config f5a99120db done |
Writing manifest to image destination
f5a99120db6452661930a1db3bf7390eec9b963f5f62c068fa32dc1d550afad3
# rm -f /tmp/foo.tar; bin/podman image save -q -o /tmp/foo.tar $iii
Error: creating an updated image manifest: Error during manifest conversion: "application/vnd.oci.image.layer.v1.tar+zstd": zstd compression is not supported for docker images The damage is in # grep -Rl zstd /var/lib/containers/storage/ 2>/dev/null
/var/lib/containers/storage/overlay-images/f5a99120db6452661930a1db3bf7390eec9b963f5f62c068fa32dc1d550afad3/manifest
/var/lib/containers/storage/overlay-images/f5a99120db6452661930a1db3bf7390eec9b963f5f62c068fa32dc1d550afad3/=bWFuaWZlc3Qtc2hhMjU2OmFkYWIyMmVkY2Y3NGY3MThiNDg2NzIwZjMxMjc5YzhjZjdmNjAwM2MwNWRiYTc1YmQzYWY3ZDdiMTRiMmFhMjU=
^C This is not a harmless bug. This is causing podman system tests to fail on rawhide, because one test does a push/pull, and all subsequent |
The problem is that In this specific case, changing the test to use |
I think that somebody needs to sit down and change default to zstd and make sure that podman/buildah CI passes. Changing rawhide first seems expensive to debug. |
Thanks, @vrothberg. IIUC, a small correction to your suggestion:
to:
Because as the above clearly shows, there are times when Better solution: change And, full agreement: testing and fixing and designing and changing should be done BEFORE changing this default on rawhide. |
Thanks for the correction, @edsantiago! I think this brings up a good point. Which parts of Podman should "implicitly" (?) change as well when the default compression is set to zstd? I think So, instead of changing the tests, it may be worth changing the behavior of Podman. |
(As a matter of pedantic tidiness, note that this bug seems to have shifted from “pull+push changes image ID” to “other kinds of breakage due to the default compression change”. It might have been cleaner to track the two (or more?) issues separately. If this is going to turn into a tracking bug, shall we retitle it?)
But The conversion failing is just a c/image bug. In this case I think we got lucky on timing: with no coordination at all, I think I fixed this in a recent bug week in containers/image#2151 . If so, it should not reproduce with #20621 (which drags in more changes) — or, cleaner, just with a dependency update in 7ca3c3d . |
That is probably my fault. I was trying to show that (1) yes, push/pull changes an existing loaded image, and (2) those changes are very bad. Discussion of (2) should probably continue elsewhere, I just want it 100% clear that push/pull is now destructive. |
I do agree that “pull+push+pull changes image ID” is pretty likely to break assumptions in quite a few automated systems. We always told users that manifest digests can change on push, and I think it’s very likely we either said or suggested that image IDs don’t change and are a better option. And now image IDs change as well. I don’t know that I would recommend any kind of bottom-up image matching (vs. a top-down “this is the set of applications, execute them”) but there certainly is a class of users that wants to do matching and deduplication and sanity-checks, and those users will have hard-coded assumptions about what hash value changes when. |
just opened: #20633 |
the "save" tests failure are probably fixed with containers/image#1980 I'll vendor that in as well for my test PR |
No, this happens during manifest format conversion. At the moment the manifest format conversion is attempted, layers were already successfully copied. This should be containers/image#2151 . |
Yes, confirming the cause is that the push with zstd:chunked triggered a conversion to OCI. So, when We actually got lucky: the code tries to convert to schema1 first, and in my testing, that conversion incorrectly succeeds, and only because the registry refuses the upload of schema1 by default: containers/image#2181 . I think that bug is a blocker for enabling |
containers/image#2151 is necessary but not sufficient: containers/image#2182 . (The latter bug also affects |
So, to summarize:
|
I have started containers/image#2189 , trying to list the various to-do items at a fairly detailed granularity. |
Hi, Jumping in this "old" thead, as we have just met the "podman save" issue internally following the move to zstd compression in our company base image. I see in the release notes for docker 25 released couple of days ago (https://docs.docker.com/engine/release-notes/25.0/) that this change was merged moby/moby#44598 (and looking at the github updates, it seems it broke some docker users). So it seems now docker generates OCI compatible image with "docker save". So, should we also update "podman save" to use by default the "oci-archive" rather than the "docker-archive" format by default ? |
Concrete example with docker 25, now it generates this kind of tar by default:
|
@Romain-Geissler-1A Please file new issues for things that are clearly different from the subject as stated. No-one is going to look for a |
Issue Description
This is about the regression that started in PR #20595 and since then has failed the cockpit-podman rawhide test on every PR (example). I don't see a clean pattern to this yet. #20595 fails all tests (yours and ours) on all OSes, and yet that schema bump seems harmless; other PRs succeed your tests, but still fail ours in the same way, but only on rawhide; and your own tests pass in other PRs (mostly). It is related to the containers-common update from 1-97 to 1-98 (-99 fails as well), as downgrading that package makes it work again.
But something doesn't add up. Perhaps the podman-next COPR gets builds not only from main, but from some PRs, or the PRs do builds without rebasing, or don't build against the latest podman-next, or that containers-common has some indirect effect which I don't understand.
The failing test checks image uploading and downloading to/from a registry. Until yesterday, that ended up as the same image, but now it's a different one.
Steps to reproduce the issue
This is a CLI version of the relevant part of the test:
Describe the results you received
With podman-next:
the downloaded image is different from the original:
Describe the results you expected
With current rawhide:
the downloaded image is identical to the original docker.io one:
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Standard Fedora rawhide cloud image
Additional information
Always happens. Running
sudo dnf downgrade containers-common
twice to downgrade to 4:1-97.fc40 goes back to the previous working state.The text was updated successfully, but these errors were encountered: