Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update "meta" job to skip "Sources" if sources.json is already up-to-date #8

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

tianon
Copy link
Member

@tianon tianon commented Dec 13, 2023

The contents of sources.json should be really static unless something in the scripts or the upstream sources has changed, so we only need to pay that multi-minute cost in cases where the scripts or source data has actually changed.

There are possibly edge cases here where these upstream things have changed and sources.json hasn't/won't. 🤔

However, this is strictly better than the current state where we just run it all the time.

I think we could probably avoid some of the other edge cases by having this whole script only commit updates to our .doi and .scripts submodules if sources.json or builds.json actually have changes. 🤔

@tianon
Copy link
Member Author

tianon commented Dec 13, 2023

For reference, merging docker-library/meta#10 took our "Sources" stage from ~2m30s up to ~3m20s and this change would bring it back down to milliseconds for any run that can prove that it doesn't have to actually update sources.json.

@tianon
Copy link
Member Author

tianon commented Dec 13, 2023

I think we could probably avoid some of the other edge cases by having this whole script only commit updates to our .doi and .scripts submodules if sources.json or builds.json actually have changes. 🤔

Ok, this is implemented now too.

Copy link
Collaborator

@whalelines whalelines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one question.

# only commit submodule updates if our JSON has changed (see https://github.com/docker-library/meta-scripts/pull/8)
if ! git diff --exit-code sources.json builds.json; then
git add -A .
git commit -m 'Update and regenerate' || :
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it okay if the commit fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git commit fails if there's nothing to commit, which is fair (and git push will automatically be a no-op in that case)

Now that I'm gating this on JSON files having changes, we should probably let git commit failing fail the build instead, so that's a good call-out 👍

@yosifkit yosifkit merged commit 6b0ba20 into docker-library:main Dec 13, 2023
1 check passed
@yosifkit yosifkit deleted the skip-sources branch December 13, 2023 22:42
@tianon
Copy link
Member Author

tianon commented Dec 14, 2023

There's another edge case here that means we no-op regenerated this file all night long. If there are changes to .doi but they do not result in changes to sources.json, we will re-generate on every subsequent run until that changes.

I thought about this one while writing it, and my conclusion was that it's probably OK to ignore this for now because it will improve over time as we add more images and it becomes more likely that changes to .doi actually do result in changes to sources.json, and this was intended as an optimization anyhow so falling back to the older behavior is pretty reasonable.

That being said, we could try harder here to resolve this, if we wanted to.

We could get more specific about which things changing in .doi will trigger this, with the major downside that it's a game of whack-a-mole that might result in us not regenerating when we should because something is relevant and we forgot or missed it for some reason. IMO, this needs to always fail on the side of regenerating more often than necessary vs missing generation runs (because the former is "just" a CPU cost and the latter leads to missing builds / missing images / confused and annoyed users).

A simpler option would be record in another file which commit of .doi and .scripts generated the committed sources.json so that if there's a new commit that results in no changes to sources.json, we still try generating it at least once, but then record that we did so such that we do not have to do so again. (In case the follow-up question to this is why not just use the submodule commits as this reference, I'm worried that updating submodule commits is something that's too easy to do accidentally, so I'd rather it be something explicitly generated and committed so it's a more clear signal of intent / state.)

While typing this, that compromise seems pretty reasonable, so I'll probably work on that later today. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants