Skip to content

Verify on CI that gleam binary architectures match target architectures #3897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Dec 23, 2024

Conversation

diemogebhardt
Copy link
Contributor

@diemogebhardt diemogebhardt commented Nov 26, 2024

This PR is regarding issue #1630:

  • Adds a bash script to verify that the binary architecture matches the target architecture:
    • ./bin/verify-binary-architecture.sh
  • Adds CI workflows to test the bash script on a wider variety of target architectures and OSes, than the release binaries are actually being built for. They mirror the existing workflows up to the building/installing part. But they remove all the safety precautions that restrict them to certain preconditions or branches! It wouldn't be possible to test the bash script otherwise – everything works as expected in my fork. The workflows are configured to be triggered manually:
    • ./.github/workflows/ci-verify-binary-architecture.yaml
    • ./.github/workflows/release-nightly-verify-binary-architecture.yaml
    • ./.github/workflows/release-verify-binary-architecture.yaml
  • Adds CI workflow binary architecture verification steps to:
    • ./.github/workflows/ci.yaml
    • ./.github/workflows/release-nightly.yaml
    • ./.github/workflows/release.yaml

How it works:

  • The bash script takes in the target architecture and the binary path as arguments and does it's thing
  • The CI workflow steps construct the corresponding binary path and pass them to the bash script

When reviewing:

  • It's probably best to review the changed files instead of the individual commits and skip the ./.github/workflows/*-verify-binary-architecture.yaml workflows in the first pass, as they are adding a lot of noise – and focus on the bash script and the changes to the existing workflows instead.

Notes and Todos:

  • In the initial version of the bash script I used associative arrays which led to beatiful and straightforward code. Sadly they aren't supported on darwin, so I had to refactor it using case statements which makes it a lot more verbose. I added comments to make the parts easier to understand. Instead of commenting, I usually prefer making code self explanatory by extracting things – in this case it felt better to keep everything colocated and add comments.
  • Decide on whether to remove the new test CI workflows before merging? Or keep them around for future testing?
  • Document the file outputs for the various target architectures and OSes somewhere for future reference?

Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Why is it that there are new workflows for this? There is a very large amount of extra work being performed and CI complexity to maintain here. I would have expected 1 extra step in each workflow to run the check command.

@diemogebhardt
Copy link
Contributor Author

Thank you!

Why is it that there are new workflows for this? There is a very large amount of extra work being performed and CI complexity to maintain here. I would have expected 1 extra step in each workflow to run the check command.

Let me quote some bits from above:

It wouldn't be possible to test the bash script otherwise – everything works as expected in my fork.

Decide on whether to remove the new test CI workflows before merging? Or keep them around for future testing?

I just added them for testing purposes – wanted to make sure this works in the exact environment it is supposed to be run: GitHub runners.

This way:

  • We are able to mirror the exact behavior of the ci, release and release-nightly workflows without running all the additional stuff that those workflows do after building the binary and verifying its architecture;
  • without having to meet the workflows prerequisites and constraints like:
    • release-nightly being constrained to run: if: ${{ github.repository_owner == 'gleam-lang' }}
    • release only on: push: tags: "v*";
  • able to test for even more target architectures and OSes to make the script more universal. Resulting in higher chances to not break the script when adding additional architectures and OSes later on;
  • can run the workflows in the gleam repo in the context of this PR and verify it works prior to merging.

I wanted to keep the changes to the existing workflows as minimal as possible.

@diemogebhardt diemogebhardt force-pushed the ci-verify-binary-architecture branch from 425e8fb to 7fb022f Compare November 26, 2024 20:01
@lpil
Copy link
Member

lpil commented Nov 28, 2024

Ah, thank you! I understand now. Sorry for my poor reading.

I still find the shell script rather complex and hard to read. I do think we should move the substring to match for to the top level of the workflow, to the same place we keep all the other variables. That way there's no conditional logic in the script and you can read one of them very easily in a list.

@diemogebhardt
Copy link
Contributor Author

diemogebhardt commented Nov 30, 2024

Ah, thank you! I understand now. Sorry for my poor reading.

No worries, I can imagine you having plenty of other things on your mind taking care of Gleam. And thank you very much for that btw! :)

I do think we should move the substring to match for to the top level of the workflow, to the same place we keep all the other variables. That way there's no conditional logic in the script and you can read one of them very easily in a list.

Hmm – I am not sure where else it might make more sense to put it? From what I see, all the variables are defined as part of the individual workflows. Or did I miss something?

Currently, the only context where this mapping is useful, is within this verification script. The script is being called from and reused by three workflows (ci.yaml, release.yaml and release-nightly.yaml). So if we move it outside of the script as part of the individual workflows, we'd need to duplicate it across all three workflows. That way we might have made the script less complex but at another cost. Reusable workflows might help with duplication but then again, also add complexity.

A subset of the following target architectures is defined for each workflow:

target:
  - x86_64-unknown-linux-gnu
  - aarch64-unknown-linux-gnu
  - x86_64-unknown-linux-musl
  - aarch64-unknown-linux-musl
  - x86_64-apple-darwin
  - aarch64-apple-darwin
  - x86_64-pc-windows-msvc
  - aarch64-pc-windows-msvc

And these are exactly the strings that are being used as inputs to the build steps in order to compile the binary. To me it makes sense to use the exact same strings as the inputs to the verification steps, for any mapping that needs to happen in the context of the verification.

I still find the shell script rather complex and hard to read.

Apart from the mapping, what is it that makes it so hard to read?

Did you have a look at the file with proper syntax highlighting? Not saying the script is a piece of art but the "Files changed" view makes it definitely harder to read.

For shell scripts I also find an indentation 4 spaces instead of 2 spaces easier on the eyes.

I tried to make it easier to read by:

  • adding those section header comments;
  • explanatory naming of variables and functions;
  • extracting the mapping into its own function, so that it is easier to disregard when looking at the script.

Options that I can think of:

  1. Using associative arrays but as mentioned above, they are not an option due to compatibility issues on darwin.
  2. I did play around with extracting more functions to make it more modular, but that ends up in 2-3x more code. It felt like making it worse rather than better. In either case, the mapping and grep patterns would remain the same. These are the harder to understand parts in my opinion.
  3. Another option would be to reverse the logic and normalize the parsed binary architecture to match the passed in target architecture. That way we can get rid of the OS parsing and validation completely, as well as the match. We would loose documentation though, so instead of having that as code we'd need to add that as comments.

I've just created a rough draft of “Option 3” but would like to test it on a GitHub runner first. I'll catch some sleep now and push it for you to review sometime later today. ✌️

@diemogebhardt diemogebhardt force-pushed the ci-verify-binary-architecture branch 3 times, most recently from 0f7569f to c595abc Compare November 30, 2024 18:18
@diemogebhardt
Copy link
Contributor Author

diemogebhardt commented Nov 30, 2024

Thanks for insisting, that's better. How do you feel about it now?

Testing

All iterations are tested for all 3 workflows and all 8 target architectures respectively.

Matching

Refactored it so that both, target and binary architectures:

  • reuse the same matching patterns,
  • and are normalized to the official architecture names AArch64 and x86-64.

Undecided whether we want to stay strict or be more forgiving, but we could make the matching case insensitive.

Debugging

Made sure for the workflow step output to include all the necessary debug information for fixing issues, should it ever fail:

Run set -xeuo pipefail
+ BINARY_PATH=target/x86_64-unknown-linux-gnu/release/gleam
+ [[ x86_64-unknown-linux-gnu == *\w\i\n\d\o\w\s* ]]
+ ./bin/verify-binary-architecture.sh x86_64-unknown-linux-gnu target/x86_64-unknown-linux-gnu/release/gleam
+ '[' 2 -ne 2 ']'
+ TARGET_TRIPLE=x86_64-unknown-linux-gnu
+ BINARY_PATH=target/x86_64-unknown-linux-gnu/release/gleam
++ file -b target/x86_64-unknown-linux-gnu/release/gleam
+ BINARY_FILE_TYPE='ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=601c49edc1890db724f4a1fbfc450fa08b40df4d, for GNU/Linux 3.2.0, not stripped'
+ ARCHITECTURE_PATTERNS_FOR_AARCH64='aarch64|Aarch64|arm64'
+ ARCHITECTURE_PATTERNS_FOR_X86_64='x86-64|x86_64'
++ echo x86_64-unknown-linux-gnu
++ parse_architecture
++ normalize_architecture
++ sed -E -e 's/(aarch64|Aarch64|arm64)/AArch64/' -e 's/(x86-64|x86_64)/x86-64/'
++ head -n1
++ grep -Eo -e 'aarch64|Aarch64|arm64' -e 'x86-64|x86_64'
+ TARGET_ARCHITECTURE=x86-64
++ echo 'ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=601c49edc1890db724f4a1fbfc450fa08b40df4d, for GNU/Linux 3.2.0, not stripped'
++ parse_architecture
++ normalize_architecture
++ sed -E -e 's/(aarch64|Aarch64|arm64)/AArch64/' -e 's/(x86-64|x86_64)/x86-64/'
++ grep -Eo -e 'aarch64|Aarch64|arm64' -e 'x86-64|x86_64'
++ head -n1
Architecture match for 'x86_64-unknown-linux-gnu'!
+ BINARY_ARCHITECTURE=x86-64
+ '[' x86-64 '!=' x86-64 ']'
+ echo 'Architecture match for '\''x86_64-unknown-linux-gnu'\''!'
+ exit 0

Failing workflow steps are annoying enough, so lets make it easy to debug and fix.

Documentation

I was afraid that it might not be self documenting enough, but it's actually quite good.

But it might still make sense to document which platforms use which architecture patterns:

# `darwin`: `arm64` and `x86_64`
# `linux`: `aarch64` and `x86-64`
# `windows`: `Aarch64` and `x86-64`
# `rust build target`: `aarch64` and `x86_64`

Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, but there's still logic in the script for mapping between strings.

sed -E \
-e "s/($ARCHITECTURE_PATTERNS_FOR_AARCH64)/AArch64/" \
-e "s/($ARCHITECTURE_PATTERNS_FOR_X86_64)/x86-64/"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove all the logic from this script relating to converting strings to other strings. I want this script to accept an exactly string and check that the output of file matches it. There should be zero mapping logic in here at all, it is too hard to understand.

@lpil lpil marked this pull request as draft December 9, 2024 16:36
@diemogebhardt diemogebhardt force-pushed the ci-verify-binary-architecture branch from 494b06f to 7d371da Compare December 10, 2024 21:27
@diemogebhardt diemogebhardt force-pushed the ci-verify-binary-architecture branch from 91c8ecb to 242cd82 Compare December 10, 2024 23:12
@diemogebhardt
Copy link
Contributor Author

There we go – spread expected binary architecture strings across all three GitHub workflows.

@diemogebhardt diemogebhardt marked this pull request as ready for review December 10, 2024 23:14
@diemogebhardt diemogebhardt force-pushed the ci-verify-binary-architecture branch 2 times, most recently from 391f01a to d2859b5 Compare December 11, 2024 00:13
…ion script accross all three GitHub workflows as per Louis preference
@diemogebhardt diemogebhardt force-pushed the ci-verify-binary-architecture branch from d2859b5 to b2f9c6e Compare December 11, 2024 00:14
Copy link
Member

@lpil lpil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful work! Thank you for working through that with me! I know I am very slow to reply!

@lpil lpil enabled auto-merge (rebase) December 23, 2024 10:47
@lpil lpil disabled auto-merge December 23, 2024 10:48
@lpil lpil enabled auto-merge (rebase) December 23, 2024 10:48
@lpil lpil merged commit 93feab2 into gleam-lang:main Dec 23, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants