-
-
Notifications
You must be signed in to change notification settings - Fork 51
fix(py_venv): work in terms of bytes when patching shebang lines #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
As [Uv's implementation](https://github.com/astral-sh/uv/blob/db14cc3005d2cd53802cb04c2f1e177a22c934ac/crates/uv-install-wheel/src/wheel.rs#L425) notes: > scripts might be binaries, so we read an exact number of bytes instead of the first line as string Indeed, one wheel that contains a binary "script" is [`uv` itself](https://pypi.org/project/uv/). Constructing a venv that happens to include `uv` was previously failing with: ``` ERROR: /Users/peter/tecton/sdk/pypi/BUILD.bazel:97:8: Action sdk/pypi/.venv failed: (Exit 1): sandbox-exec failed: error executing Action command (cd /private/var/tmp/_bazel_peter/dfecb8ec3f6f433d8509be7ebe017232/sandbox/darwin-sandbox/589/execroot/_main && \ exec env - \ TMPDIR=/var/folders/9_/p2d_shr10b91_464_3jfl5t80000gn/T/ \ /usr/bin/sandbox-exec -f /private/var/tmp/_bazel_peter/dfecb8ec3f6f433d8509be7ebe017232/sandbox/darwin-sandbox/589/sandbox.sb /var/tmp/_bazel_peter/install/96e26d97222159f904e14600d7490eb0/process-wrapper '--timeout=0' '--kill_delay=15' '--stats=/private/var/tmp/_bazel_peter/dfecb8ec3f6f433d8509be7ebe017232/sandbox/darwin-sandbox/589/stats.out' bazel-out/darwin_arm64-opt-exec-ST-2adb5a2e0ae2/bin/external/aspect_rules_py~/py/tools/venv_bin/venv_macos_aarch64_build '--location=bazel-out/darwin_arm64-fastbuild/bin/sdk/pypi/.venv' '--venv-shim=bazel-out/darwin_arm64-fastbuild-ST-2adb5a2e0ae2/bin/external/aspect_rules_py~/py/tools/venv_shim/shim_macos_aarch64_build' '--python=python_3.8_macos_aarch64_runtime/python/install/bin/python3.8' '--pth-file=bazel-out/darwin_arm64-fastbuild/bin/sdk/pypi/venv.pth' '--env-file=bazel-out/darwin_arm64-fastbuild/bin/sdk/pypi/venv.env' '--bin-dir=bazel-out/darwin_arm64-fastbuild/bin' '--collision-strategy=error' '--venv-name=.venv' '--mode=static-copy' '--version=3.8') Error: × Unable to run command: ╰─▶ stream did not contain valid UTF-8 ```
Ok(()) | ||
} | ||
|
||
const PLACEHOLDER_SHEBANG: &[u8] = b"#!/dev/null"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shebang is unspecified behavior of rules_python
. As part of how rules_python
implements installing packages an interpreter path must be specified but since that path is being specified at module/workspace setup time there's no way to know either the Bazel label or the relative path or anything else about the interpreter with which the script may eventually be invoked. So rules_python
does the "reasonable" (insane) thing and uses /dev/null
as the shebang. It could use /bin/false
or any other value.
I don't think it's reasonable or future-proof to hardcode this or use the read_exact
strategy here. The protocol should be to read the first 512b, see if it starts with #!
and there's a \n
in there and replace that first line if such.
I think your rewind()
machinery fails to strip the shebang from the copy source as this PR stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the info. I've added it to the PR. It is a bit wild, I wouldn't have expected rules_python
to be the source.
The two behaviours identified as defects were exactly preserved from the prior implementation. This PR only fixes the defect it claims to — the venv builder choking on binary files in the scripts directory.
- Only inspecting the first 11 bytes, not looking for a newline:
rules_py/py/tools/py/src/venv.rs
Line 623 in 3ff3b51
if content.starts_with("#!/dev/null") { - Adding our own shebang as a prefix, not replacing the existing one:
rules_py/py/tools/py/src/venv.rs
Line 624 in 3ff3b51
content.replace_range(..0, &RELOCATABLE_SHEBANG);
I do agree that its a little odd to do things this way, and I'd be happy to work with you towards getting a more correct shebang logic in place (BTW, do you know of any packages that trigger the shebang substitution logic so that we can cover all of this with a test?), but they are not what cause the issue I am seeking to address and so I do not think they should be a part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this patch set is unacceptable, the issue of binaries triggering stream did not contain valid UTF-8
could alternately be addressed by handling specifically that error. That looks like this: main...plobsing:rules_py:ignore_invalid_utf8 .
I like that solution less because, while it handles more cases than are handled today, including the one I care about, it just feels less correct. In principle, a Python source file is not required to be UTF-8 (PEP 263 is still current and documented for recent Pythons, even if the feature is little used); the encoding assumption/assertion made by using read_to_string
to process bin
files, even only Python sources, just isn't great in general.
|
As reported by customers, the naive but correct strategy of using copies in `py_venv_*` can lead to laughable disk usage. Some clients are reporting order 10min slowdowns and order 100GiB disk usage wasted copying inputs into binaries. We need a more scalable strategy such as symlinking. Thankfully we can generate symlinks from tools driven by Bazel into a TreeArtifact so long as the symlinks aren't dangling. By carefully crafting relative symlinks we're able to produce a tree of links which is valid both at and after action time. When relocating a `.runfiles` tree containing such links (for instance into a OCI later tar) these links must be dereferenced but that Just Works. While I'm at it, refactor the venv machinery to operate in terms of strategies and combinators on strategies so that it's simpler to talk about the production-grade behavior we want which is: * `site-packages` trees in 1stparty code get relocated/linked into the venv * `bin` sibling trees in 1stparty code get relocated/patched into the venv * General trees in 1stparty code are referred to by `.pth` file entries * General trees in 3rdparty code get relocated/linked into the venv * `bin` sibling trees in 3rdparty code get relocated/patched into the venv This makes the venv builder significantly more flexible, allows for better error reporting and opens the door to more flexible error handling. Incorporates an implementation of #606, but testing is required. Should include an implementation of #635, but testing is required. ### Changes are visible to end-users: yes - Searched for relevant documentation and updated as needed: yes - Breaking change (forces users to change their own code or config): no - Suggested release notes appear below: yes `py_venv_*` now use symlinks rather than hard file copies which radically reduce disk usage while improving venv building performance. ### Test plan - Covered by existing test cases - New test cases added - Manual testing; please provide instructions so we can reproduce: TODO. ### Remaining work - [x] Strip debug prints - [x] Improve collision handling - [x] Rework the command interpreter to implement the last-wins semantics - [x] Mitigate spooky dangling symlink issues - [x] Fix a regression which can cause a `site-packages/__init__.py` file to be linked - [x] Add sha256-sum based collision ignoring - [ ] Add a test covering that a `site-packages/__init__.py` file will not be linked - [ ] Add a test covering bin shebang patching - [ ] Integrate the test case from #635 - [ ] Manually test that linked venvs still work; should just be fine --------- Co-authored-by: Alexander Payne <alexpayne@aspect.build>
As Uv's implementation notes:
Indeed, one wheel that contains a binary "script" is
uv
itself.Constructing a venv that happens to include
uv
was previously failing with:Changes are visible to end-users: yes
fix(py_venv): binaries in the scripts folder no longer crash venv builder
Test plan
Minimized repro including only a single, problematic package (
uv
) in the venv:uv_repro.zip