-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-platform native launchers for Python #275
base: main
Are you sure you want to change the base?
Changes from 3 commits
7aea8d8
a5fe5f7
7e3c166
ef27d65
b870229
a6fc56e
a54cb0d
b3d7357
c50aca9
91c779a
0d7c635
6c584e2
d3cdbe8
2368a55
72b21b5
f3ddda3
2c2156f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
--- | ||
created: 2022-09-12 | ||
last updated: 2022-09-12 | ||
status: To be reviewed | ||
reviewers: | ||
- TODO | ||
title: Cross-platform native launchers for Python | ||
authors: | ||
- groodt | ||
--- | ||
|
||
|
||
# Abstract | ||
|
||
This document describes an approach for launching `py_binary` artifacts hermetically using the resolved python toolchain. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is really Python specific and is fairly easy to generalize to other languages. Any language which has a separate "runtime executable" has the same basic problem -- Java, Ruby, JavaScript, etc. I think there is also some overlap with debuggers, coverage tools, and test runners -- it's not uncommon for them to want to control the invocation of the program's original main. Finally, there is a bit of conceptual overlap with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Who would be the most likely person to accept this sort of proposal? I'm happy to expand this beyond Python (I mention in my proposal that it could be expanded). Im just not sure if it's better to start incrementally and pick Python first, or if it's better to start broad. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Java is the most immediate one that comes to mind. I'm pretty sure they are basically doing the same thing we are doing today (shell script that does setup, an optional launcher for a native binary). I emailed our internal language-rule-owners group. I'm primarily interested in collecting advice and assessing interest at this point. I like to keep as small a set of Deciders as possible. Not all advice must be obeyed.
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
# Background | ||
|
||
Currently, `py_binary` is non-hermetic and launches inconsistently between platforms. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think something worth mentioning is the "Python Launcher for Windows". Basically, a python.org Windows installs have a See |
||
|
||
On macos and Linux, there is a [python_stub](https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/bazel/rules/python/python_stub_template.txt) | ||
that is non-hermetic and requires a "bootstrap" python interpreter on the host. The "shebang" can be overridden, but | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. re: "requires a "bootstrap" python interpreter on the host": Well, not necessarily. It's possible to make the stub use the same runtime as the program it ultimately invokes. This is a pretty sensible thing to do, I think. This is fine if you're already using a system interpreter (you already have an absolute path) The problem case are "in build" interpreters -- you have to put a relative path to the interpreter. This makes running the program sensitive to the CWD you start the program in. Transforming that relative path into an absolute path requires some sort of runtime logic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is probably the only case where it's fine and it has other issues. It's not portable unless everyone has the same interpreter installed at the same global path. It also goes against the premise of the proposal, in the sense that it's "non-hermetic". It requires all hosts to have a preinstalled runtime at a particular version, which isn't easy to guarantee.
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
a "shebang" is always dependent on the runtime host. | ||
|
||
On Windows, there is a [native launcher](https://github.com/meteorcloudy/bazel/blob/master/src/tools/launcher/python_launcher.cc) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you meant to link to bazelbuild here, not meteorcloudy? https://github.com/bazelbuild/bazel/blob/master/src/tools/launcher/python_launcher.cc |
||
that launches `python.exe` on the host which then launches the `py_binary` with the same `python_stub` as macos and linux. | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Related issues: | ||
* [py_binary with hermetic toolchain requires a system interpreter](https://github.com/bazelbuild/rules_python/issues/691) | ||
* [Neither python_top nor python toolchain works with starlark actions on windows](https://github.com/bazelbuild/bazel/issues/7947#issuecomment-495265016) | ||
|
||
This situation is undesirable because it assumes that the target platform has a bootstrapping python interpreter | ||
available and makes the hermetic python interpreters available with `rules_python` less useful. It is also surprising to | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
users who expect bazel to output self-contained binary artifacts for a target platform. | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The reason this situation exists is because of "bootstrapping". Ultimately, *something* needs to find the python | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
interpreter in the runfiles and use that to launch the program. Currently, bazel assumes the target platform will | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
be able to provide the "bootstrapping" functionality. | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
# Proposal | ||
|
||
Extend the native launcher functionality to all platforms and use it to locate the relevant python interpreter and | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
python program in the `runfiles` tree to launch the `py_binary`. No assumptions should be made about the target platform. | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
In pseudo-code, the proposal is as follows: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is pretty high-level psuedo-code :). Something a little more concrete would be better. e.g., it has to find the runfiles directory to resolve the relative path names. |
||
|
||
``` | ||
exec(env, runfiles-interpreter, ["interpreter_arg1",], "main.py", ["arg1",]) | ||
``` | ||
|
||
| Token | Description | | ||
| ---------------------- | ----------- | | ||
| env | Dictionary of key-value pairs for the environment of the process | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see why env is one of the inputs? This basically implies that the launcher process may need to use a modified environment from the actual program -- what's the motivation case for this? Why would it not just inherit the existing environment? Ah, one case I just thought of: LD_PRELOAD (or equiv). Basically, a binary might require such a setting and we wouldn't want the launcher itself to use that (and by "might" i mean, we have this feature internally at Google) |
||
| runfiles-interpreter | The resolved python toolchain in runfiles | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The description here doesn't quite make sense with what the arg name implies. The arg name sounds like a string path. The description is "the python toolchain", which is a complex object. |
||
| ["interpreter_arg1",] | An array of arguments to provide to the python interpreter | | ||
| "main.py" | The python program to launch in runfiles | | ||
| ["arg1",] | An array of arguments to provide to the python program as sys.argv[1:] | | ||
|
||
This native launcher idea has been proposed a few times by bazel contributors and the community: | ||
* [Greg Roodt (Community)](https://github.com/bazelbuild/rules_python/issues/691#issuecomment-1174935972) | ||
* [Yun Peng (Google)](https://github.com/bazelbuild/bazel/issues/7947#issuecomment-495265016) | ||
* [Richard Levasseur (Google)](https://github.com/bazelbuild/rules_python/issues/691#issuecomment-1186379617) | ||
|
||
Some related work has been done that fixes Linux to Windows cross-builds of the Windows launcher. See: [Fix Linux to Windows cross compilation of py_binary, java_binary and sh_binary using MinGW](https://github.com/bazelbuild/bazel/pull/16019) | ||
This proposal would aim to go further and have these launchers available on all platforms, including cross_builds where appropriate toolchains are in place. | ||
|
||
Once this proposal is implemented, it would enable cross-builds of hermetic `py_binary` for all major platforms. It | ||
would also remove the complexity introduced by having so many chains of nested execution to launch a python program. | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Finally, while this proposal is specific to python, this solution could perhaps be reused for `java_binary`, `sh_binary` | ||
groodt marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should reach out to those owners to see if they have input to add. |
||
and perhaps be made available for any custom rules that require an interpreter to launch. | ||
|
||
|
||
# Backward-compatibility | ||
|
||
This proposal could require users to setup a cc toolchain for remote execution. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it required to be CC? e.g., what if someone wants to write a launcher in rust? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, does not need to be CC. I think the launcher needs to be compiled as native in some way. So Go, Zig, Rust, CC all come to mind. Whatever has the most minimal toolchain requirements on the user and gives the functionality we require I think. I think whatever is used as a launcher, needs to be standalone from Bazel once built. I don't want to ship bazel to run a binary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree. The part I'm trying to think through is re-use of what is essentially the same binary (the launcher itself). For a target-config build target, I agree, yes, the launcher essentially needs to be self-contained and standalone. I don't see how to do it otherwise because the invocation of For a build tool, the situation is different[1] -- stuff run during the build doesn't need the stricter isolation requirements. For example, when Bazel runs an executable in an action, it could avoid having to build the launcher entirely by doing the exec() call itself when it runs the subprocess. Maybe a target could return e.g. This then leads me to think that, if a rule returned that, Bazel itself could invoke the launcher building instead of the rule having to do so. Which has a Just Works sort of appeal; but risks coupling behavior to the Bazel release (which might be more of a headache). [1] This case is particularly on my mind because Python is often used for build tools, and building the runtime and C dependencies is pretty expensive, so reuse of that is highly beneficial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I'm +1 on the core idea.