Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace grep base images parsing with dockerfile-json #1304

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

mkosiarc
Copy link
Contributor

@mkosiarc mkosiarc commented Aug 15, 2024

This is more reliable and allow us to fix bugs where base images were loaded incorrectly.

For example, previously this part in Dockerfile:

LABEL description="this is a build
from single-arch"

Would return "single-arch" as a base image.

Using dockerfile-json also solves the problem of omitting "scratch" from the results.

Another advantage is that when we have something such as:

FROM registry.access.redhat.com/ubi9/ubi:latest as builder ...
FROM builder AS build1

then only the original image
"registry.access.redhat.com/ubi9/ubi:latest" will be reported.

KFLUXBUGS-1269

@mkosiarc mkosiarc marked this pull request as ready for review August 15, 2024 15:16
@mkosiarc
Copy link
Contributor Author

This has to be merged only after konflux-ci/buildah-container#62 and the image reference has to be replaced.

@mkosiarc mkosiarc force-pushed the improve-dockerfile-parsing branch 7 times, most recently from ef4d404 to 873cb50 Compare August 16, 2024 06:40
@mkosiarc
Copy link
Contributor Author

/retest

@chmeliik
Copy link
Contributor

When playing with dockerfile-json to see if it could help with #1200, I found out it will probably make things even worse 😞

FROM registry.fedoraproject.org/fedora-minimal:40

RUN /bin/sh <<EOF
mkdir -p /etc/foo
cp -aR /src/foo /etc/foo
EOF
$ dockerfile-json /tmp/Dockerfile
[dockerfile-json 1.0.8] error: parse "/tmp/Dockerfile": dockerfile/instructions.Parse dockerfile parse error on line 4: unknown instruction: mkdir

It looks like the latest release of dockerfile-json (from 3 years ago: https://github.com/keilerkonzept/dockerfile-json/releases) uses a version of buildkit that didn't yet support heredocs (and who knows what else).

Then I tried to build dockerfile-json from source to see if it's fixed in main, but it doesn't even compile anymore:

# github.com/keilerkonzept/dockerfile-json/pkg/dockerfile
pkg/dockerfile/parse.go:26:46: not enough arguments in call to instructions.Parse
	have (*parser.Node)
	want (*parser.Node, *linter.Linter)

All my trust in that project just went out the window

@chmeliik
Copy link
Contributor

chmeliik commented Aug 16, 2024

On the other hand, the fix was pretty simple if we want to fork the project and add some CI so that the auto-merged Renovate updates don't just randomly break main. Or contribute this upstream, but it looks rather abandoned. The last commit by someone other than renovate-bot was 3 years ago (the 1.0.8 release)

diff --git a/pkg/dockerfile/parse.go b/pkg/dockerfile/parse.go
index 44daf75..02dac0a 100644
--- a/pkg/dockerfile/parse.go
+++ b/pkg/dockerfile/parse.go
@@ -23,7 +23,7 @@ func ParseReader(r io.Reader) (*Dockerfile, error) {
 	if err != nil {
 		return nil, fmt.Errorf("dockerfile/parser.Parse %v", err)
 	}
-	stages, metaArgs, err := instructions.Parse(result.AST)
+	stages, metaArgs, err := instructions.Parse(result.AST, nil)
 	if err != nil {
 		return nil, fmt.Errorf("dockerfile/instructions.Parse %v", err)
 	}

@mkosiarc mkosiarc force-pushed the improve-dockerfile-parsing branch from a86488a to 0a0fe43 Compare August 17, 2024 15:46
@mkosiarc
Copy link
Contributor Author

@chmeliik Thanks for your investigation here. I tried forking the repo, so far just to my account and adding the fix you proposed. Now it can be build from source and the main branch - see konflux-ci/buildah-container#62

Now, the heredocs can be parsed as well, because it uses latest buildkit version. But this concerns me, I wanted to avoid having more work, but if we decide to fork the repo, setup the CI and maintain it, it will require further work down the line. But seems like we do need such a funcionality. The dockerfile-json uses the parsing from the buildkit directly, if we wanted we could build our own tool around it as well.

Copy link
Contributor

@chmeliik chmeliik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after the temp changes are undone

This is more reliable and allow us to fix bugs where base images were
loaded incorrectly.

For example, previously this part in Dockerfile:

LABEL description="this is a build \
                   from single-arch"

Would return "single-arch" as a base image.

Using dockerfile-json also solves the problem of omitting "scratch" from
the results.

Another advantage is that when we have something such as:

FROM registry.access.redhat.com/ubi9/ubi:latest as builder
...
FROM builder AS build1

then only the original image
"registry.access.redhat.com/ubi9/ubi:latest" will be reported.

KFLUXBUGS-1269

Signed-off-by: mkosiarc <mkosiarc@redhat.com>
@mkosiarc mkosiarc force-pushed the improve-dockerfile-parsing branch from 00545d3 to c23c0cb Compare August 29, 2024 15:47
@@ -288,14 +288,12 @@ spec:

BUILDAH_ARGS=()

BASE_IMAGES=$(grep -i '^\s*FROM' "$dockerfile_path" | sed 's/--platform=\S*//' | awk '{print $2}' | (grep -v ^oci-archive: || true))
BASE_IMAGES=$(dockerfile-json "$dockerfile_path" | jq -r '.Stages[] | select(.From | .Stage or .Scratch | not) | .BaseName')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now possible to have an empty value here, right whereas before there was always going to be at least one value?

Will this break any functionality elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, previously there was always at least one value - e.g when it was FROM scratch, but now we are omitting that. That's why I could remove all those if conditions that are checking for "scratch".

I tested it with a couple of builds and the e2e tests are passing. And from the logic of the buildah it does not seem anything should break, since when we are passing that further (for example to the sbom), the "scratch" was omitted anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty Dockerfile should seldom happen. However, if it happens accidentally, dockerfile-json will report an error like [dockerfile-json 1.0.8] error: parse "./Dockerfile": dockerfile/parser.Parse file with no instructions. And set -o pipefail is not set in the build step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good that the e2e tests are passing but I don't have much faith in the robustness of the e2e tests lately (i.e. do they test an image which is just FROM scratch)? Would you be able to test this change in a pipeline that has a simple FROM scratch Containerfile?

If you need a file, you can customize this pipeline: https://github.com/konflux-ci/olm-operator-konflux-sample/blob/main/.tekton/gatekeeper-operator-bundle-pull-request.yaml#L35 and just modify the Dockerfile to remove the first stage (or better yet, test it with the first stage and without).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a FROM scratch test, see konflux-ci/e2e-tests@9c5c74c and 5573eb3. But I tested it anyway, once again, now also with multiple stages as has the dockerfile you linked.

@arewm
Copy link
Member

arewm commented Aug 29, 2024

Is there a reason that we are only making these changes on the v0.2 tasks?

@mkosiarc
Copy link
Contributor Author

Is there a reason that we are only making these changes on the v0.2 tasks?

I remember briefly discussing this with Adam, that it does not seem to worth to update the 0.1 tasks, as we are not expecting many users to still use them and this update does not seem that critical to include it in the old tasks. But if you have better knowledge of this and feel like I should update the 0.1 tasks as well then let me know.

@mkosiarc mkosiarc requested a review from arewm August 30, 2024 13:40
@arewm
Copy link
Member

arewm commented Aug 30, 2024

I remember briefly discussing this with Adam, that it does not seem to worth to update the 0.1 tasks, as we are not expecting many users to still use them and this update does not seem that critical to include it in the old tasks. But if you have better knowledge of this and feel like I should update the 0.1 tasks as well then let me know.

The question comes from a maintenance burden. If the change is minimal to make, then it seems like it would be better to make it so that the tasks are the same.

I am not saying that we should do it, I am just raising the observation that we are not.

@mkosiarc
Copy link
Contributor Author

mkosiarc commented Sep 3, 2024

I remember briefly discussing this with Adam, that it does not seem to worth to update the 0.1 tasks, as we are not expecting many users to still use them and this update does not seem that critical to include it in the old tasks. But if you have better knowledge of this and feel like I should update the 0.1 tasks as well then let me know.

The question comes from a maintenance burden. If the change is minimal to make, then it seems like it would be better to make it so that the tasks are the same.

I am not saying that we should do it, I am just raising the observation that we are not.

It might be minimal to make, but it would again take some time to test it. And as far as I understand, the e2e tests here are running only the newer version of tasks.

@mkosiarc mkosiarc added this pull request to the merge queue Sep 3, 2024
Merged via the queue into konflux-ci:main with commit f12435b Sep 3, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants