Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is stamping breaking hermeticity? #5

Open
eed3si9n opened this issue May 12, 2023 · 6 comments
Open

Is stamping breaking hermeticity? #5

eed3si9n opened this issue May 12, 2023 · 6 comments

Comments

@eed3si9n
Copy link
Contributor

steps

Build from arbitrary host machine, because of CI, k8s, etc.

problem

I think we end up building the slightly different image each time because of container_push#stamp_to_env, which is True by default?

note

Workspace status:

Bazel always outputs the following stable keys:

  • BUILD_EMBED_LABEL: value of --embed_label
  • BUILD_HOST: the name of the host machine that Bazel is running on
  • BUILD_USER: the name of the user that Bazel is running as
@ianoc
Copy link
Contributor

ianoc commented May 12, 2023

Stamping is usually not an hermetic thing. Involves time stamps and such. Your referring to the containers produced here rather than building these rules themselves I imagine ?

@ianoc
Copy link
Contributor

ianoc commented May 12, 2023

This is controlled by the workspace status command in bazel in your local repo where you can pick which variables can make it into this. Is it possible you have an unstable variable marked as stable in it ? — you could capture the build output in your ci too to compare maybe ?

@ianoc
Copy link
Contributor

ianoc commented May 12, 2023

I’m not sure stamping takes effect without the arg for it either. It would be good to capture the packer outputs on multiple machines to diff? I think I’d you build the target you should be able to grab the packer output we make in the build

@eed3si9n
Copy link
Contributor Author

I am specifically thinking

stamp_info_file = ctx.info_file.short_path,

I think that one does not include the "volatile" stamps, but it does include BUILD_HOST according to the doc that I linked and GIT_COMMIT and GIT_BRANCH that we do generate via --workspace_status_command.

So I don't have a smoking gun on why, but basically every time we run a canary we seem to be invalidating something and end up building a brand new image, which ends up invalidating other things including a JSON file that contains the Docker image tag.

A better approach might be letting container_push take a dictionary of environment variable that the build author is willing to expose -- either a hardcoded value or from filtered down stamp so that change of machine or git commit won't trigger Docker push (even when stamp may contain those things)?

@ianoc
Copy link
Contributor

ianoc commented May 13, 2023

You can turn it off in your repo at the call sites which used to be via a macro right ? Since there is a flag exposed there. Or is the problem we want some stamp fields but not others ?

I think it would be good to capture that output to see. Bazel also doesn’t consider this stuff in its hash for a rebuild usually. So I think if the inputs get a cache hit you would still get the same output with a remote cache. But some of that “maybe in the inputs of an action” behavior I’m always a bit skeptical of.

With this you could have bazel dump the execution log to try help but without remote execution iirc these days it’s hard to get a recursive Merkle tree.

what does rules docker do here ? This stamping stuff has a bunch of annoying things so I’d rather not be too special unless we need to be ? But filtering it with a list of allowed keys seems like it should be safe and not too unusual to me. Do you see the BUILD_HOST showing up somewhere in the docker config in your repository ?

diffing the produced files from the rule might do it well too

@ianoc
Copy link
Contributor

ianoc commented May 13, 2023

So I don't have a smoking gun on why, but basically every time we run a canary we seem to be invalidating something and end up building a brand new image, which ends up invalidating other things including a JSON file that contains the Docker image tag.

I'm a little rusty on this tbh, but iirc the tag is generated before this code is called, container_push requires you pass in the tag to use. If your building a brand new image, i think you want to try grab the assembled outputs or the container_binary stuff if it looked like it used to internally. I could be wrong but my initial thought would be that the python is doing something non-hermetic like including pyc files or something like that. But if its happening upstream i think the assembled json files probably contain a good pointer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants