Acyclic compiler-rt and libc bootstrap #127227

Ericson2314 · 2025-02-14T17:05:08Z

This is responding to the conversation in #125922, but I am opening a new issue because I would like to disentangle the larger idea from that specific PR.

They way some us Nixpkgs compiler maintainers see it, the ideal bootstrap order is:

Build "true builtins", completely self-contained
Build libc
Build any additional "pseudo-builtins" depending on libc, without rebuilding the "true builtins"
Build additional goodies like sanitizers

To wit, if the builtins and libc are really cyclic, then all sorts of accidental recursion is possible (imagine emutls eventually recurring back into emutls). On the flip side, if no recursion is happening, then the cyclic dep is in fact spurious and the acyclic dependency order already exists and is just waiting to "break free".

Yes, the circular dep is common, but it strikes as a more a historical accident that something anyone would want on purpose.

(For reference. Another such historical accident is building all of GCC twice, once without libc, once with. That is obviously overkill, and people rigged up things like https://github.com/richfelker/musl-cross-make to avoid it. LLVM doesn't engage in such folly, making it easy to build compiler-rt and clang separately. I think if we do disentangle this circle dep, the old "use libc headers" will be looked back upon in hindsight as just as silly.)

BTW, this sort of disentangling would also be good for Rust. Their "compiler builtins" package with bits of compiler-rt when doing a freestanding or WASM (without WASI at least) build ought not to depend on any libc, not even a newlib.

Getting down to brass tacks:

Stuff like emutls feels to be like clearly a "pseudo-builtins", it fallback logic in non-trivial software.
The builtins that use getauxval are a bit trickier. Can we skip them entirely in the first "true builtins" step? Unclear. And so you want to use features depending on hardware detection in freestanding code? Not sure what the right solution is, but ideally there is some interface that is amendable to OS-leveraging and freestanding approach, and it is more defined than "whatever in libc we happen to use".

The text was updated successfully, but these errors were encountered:

llvmbot · 2025-02-14T17:05:43Z

@llvm/issue-subscribers-libc

Author: John Ericson (Ericson2314)

*This is responding to the conversation in https://github.com//pull/125922, but I am opening a new issue because I would like to disentangle the larger idea from that specific PR.*

They way some us Nixpkgs compiler maintainers see it, the ideal bootstrap order is:

Build "true builtins", completely self-contained
Build libc
Build any additional "pseudo-builtins" depending on libc
Build additional goodies like sanitizers

To wit, if the builtins and libc are really cyclic, then all sorts of accidental recursion is possible (imagine emutls eventually recurring back into emutls). On the flip side, if no recursion is happening, then the cyclic dep is in fact spurious and the acyclic dependency order already exists and is just waiting to "break free".

Yes, the circular dep is common, but it strikes as a more a historical accident that something anyone would want on purpose.

(For reference. Another such historical accident is building all of GCC twice, once without libc, once with. That is obviously overkill, and people rigged up things like https://github.com/richfelker/musl-cross-make to avoid it. LLVM doesn't engage in such folly, making it easy to build compiler-rt and clang separately. I think if we do disentangle this circle dep, the old "use libc headers" will be looked back upon in hindsight as just as silly.)

BTW, this sort of disentangling would also be good for Rust. Their "compiler builtins" package with bits of compiler-rt when doing a freestanding or WASM (without WASI at least) build ought not to depend on any libc, not even a newlib.

Getting down to brass tacks:

Stuff like emutls feels to be like clearly a "pseudo-builtins", it fallback logic in non-trivial software.
The builtins that use getauxval are a bit trickier. Can we skip them entirely in the first "true builtins" step? Unclear. And so you want to use features depending on hardware detection in freestanding code? Not sure what the right solution is, but ideally there is some interface that is amendable to OS-leveraging and freestanding approach, and it is more defined than "whatever in libc we happen to use".

efriedma-quic · 2025-02-14T19:41:04Z

I don't think we can make the builtins build completely independent of libc headers: the stuff that's libc dependent is truly platform-dependent, and there's no real way to work around that. And we can't modify the set of APIs exposed by the builtins library.

So to make this work, we need a target to build the stripped-down builtins library (with a different name, so it doesn't get confused with the real one). Then we need a clang flag to tell the compiler/linker to use the stripped-down builtins library. Then once you have both of those, you can build libc against the stripped-down builtins library. Then once you have libc, you can build everything else normally.

This is something we can do, I guess, but it seems like overkill.

I don't see any other reasonable path besides just maintaining the status quo. (The "install libc headers" thing is a bit awkward, but it's worked for everyone for a long time.)

efriedma-quic · 2025-02-14T19:44:37Z

For the feature detection stuff specifically, libc implementations currently roll their own CPU detection, instead of using the attributes provided by the compiler. That probably won't change.

Ericson2314 · 2025-02-14T20:22:15Z

@efriedma-quic

So to make this work....

That sounds good to me.

but it seems like overkill.

I don't know how to argue this, but it just...doens't to me?

Here's another benefit: right now LLVM libc also has a convoluted build building compiler-rt. With this approach, we can disenantagle that and allow each component to be built too. Much better!

the stuff that's libc dependent is truly platform-dependent, and there's no real way to work around that.

No disagreement from me. The only assumption I want to get rid of is platform-specific stuff always libc. And even that is more a matter of perspective than actually changing code. (Ideally I would do some refactors but it is not necessary.)

Really the only change is to disable/enable things to avoid building stuff twice and then link it all together.

michaelrj-google · 2025-02-14T20:45:59Z

I'm happy to have a simpler way to perform a hermetic LLVM-libc build with fresh compiler-rt. I will say our header generation is pretty much completely separate from the rest of our build, so doing that first isn't a problem (we already do that for scudo, see the COMPILER_RT_BUILD_SCUDO_STANDALONE_WITH_LLVM_LIBC cmake flag).

For platform specific stuff, I think both libc and compiler-rt need to have access to it. One way to resolve this dependency would be re-using the mechanism from Project Hand-in-Hand to share the libc-internal pieces. That way we could have a common implementation, but also avoid a build-ordering dependency. The shared code would need to be header only so it could be built as part of the compiler-rt build, but a lot of our OS specifics already are.

Ericson2314 · 2025-02-14T22:01:45Z

@michaelrj-google Oh, thank for linking Project Hand-in-Hand, that's a great comparison!

Both fundamentally relate to the libc dual mandate being "the preferred OS interface" and "the C standard library" being fundamentally unworkable IMO. And so the right way to structure the internals doesn't necessary correspond to the traditional way of dividing up the interfaces that people have come to expect.

llvmbot added the libc label Feb 14, 2025

Ericson2314 mentioned this issue Feb 14, 2025

[compiler-rt][AArch64] Enable libc-free builtins Linux build #125922

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acyclic compiler-rt and libc bootstrap #127227

Acyclic compiler-rt and libc bootstrap #127227

Ericson2314 commented Feb 14, 2025 •

edited

Loading

llvmbot commented Feb 14, 2025

efriedma-quic commented Feb 14, 2025

efriedma-quic commented Feb 14, 2025

Ericson2314 commented Feb 14, 2025

michaelrj-google commented Feb 14, 2025

Ericson2314 commented Feb 14, 2025 •

edited

Loading

Acyclic compiler-rt and libc bootstrap #127227

Acyclic compiler-rt and libc bootstrap #127227

Comments

Ericson2314 commented Feb 14, 2025 • edited Loading

llvmbot commented Feb 14, 2025

efriedma-quic commented Feb 14, 2025

efriedma-quic commented Feb 14, 2025

Ericson2314 commented Feb 14, 2025

michaelrj-google commented Feb 14, 2025

Ericson2314 commented Feb 14, 2025 • edited Loading

Ericson2314 commented Feb 14, 2025 •

edited

Loading

Ericson2314 commented Feb 14, 2025 •

edited

Loading