Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acyclic compiler-rt and libc bootstrap #127227

Open
Ericson2314 opened this issue Feb 14, 2025 · 6 comments
Open

Acyclic compiler-rt and libc bootstrap #127227

Ericson2314 opened this issue Feb 14, 2025 · 6 comments
Labels

Comments

@Ericson2314
Copy link
Member

Ericson2314 commented Feb 14, 2025

This is responding to the conversation in #125922, but I am opening a new issue because I would like to disentangle the larger idea from that specific PR.

They way some us Nixpkgs compiler maintainers see it, the ideal bootstrap order is:

  1. Build "true builtins", completely self-contained
  2. Build libc
  3. Build any additional "pseudo-builtins" depending on libc, without rebuilding the "true builtins"
  4. Build additional goodies like sanitizers

To wit, if the builtins and libc are really cyclic, then all sorts of accidental recursion is possible (imagine emutls eventually recurring back into emutls). On the flip side, if no recursion is happening, then the cyclic dep is in fact spurious and the acyclic dependency order already exists and is just waiting to "break free".

Yes, the circular dep is common, but it strikes as a more a historical accident that something anyone would want on purpose.

(For reference. Another such historical accident is building all of GCC twice, once without libc, once with. That is obviously overkill, and people rigged up things like https://github.com/richfelker/musl-cross-make to avoid it. LLVM doesn't engage in such folly, making it easy to build compiler-rt and clang separately. I think if we do disentangle this circle dep, the old "use libc headers" will be looked back upon in hindsight as just as silly.)

BTW, this sort of disentangling would also be good for Rust. Their "compiler builtins" package with bits of compiler-rt when doing a freestanding or WASM (without WASI at least) build ought not to depend on any libc, not even a newlib.


Getting down to brass tacks:

  • Stuff like emutls feels to be like clearly a "pseudo-builtins", it fallback logic in non-trivial software.

  • The builtins that use getauxval are a bit trickier. Can we skip them entirely in the first "true builtins" step? Unclear. And so you want to use features depending on hardware detection in freestanding code? Not sure what the right solution is, but ideally there is some interface that is amendable to OS-leveraging and freestanding approach, and it is more defined than "whatever in libc we happen to use".

@llvmbot llvmbot added the libc label Feb 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 14, 2025

@llvm/issue-subscribers-libc

Author: John Ericson (Ericson2314)

*This is responding to the conversation in https://github.com//pull/125922, but I am opening a new issue because I would like to disentangle the larger idea from that specific PR.*

They way some us Nixpkgs compiler maintainers see it, the ideal bootstrap order is:

  1. Build "true builtins", completely self-contained
  2. Build libc
  3. Build any additional "pseudo-builtins" depending on libc
  4. Build additional goodies like sanitizers

To wit, if the builtins and libc are really cyclic, then all sorts of accidental recursion is possible (imagine emutls eventually recurring back into emutls). On the flip side, if no recursion is happening, then the cyclic dep is in fact spurious and the acyclic dependency order already exists and is just waiting to "break free".

Yes, the circular dep is common, but it strikes as a more a historical accident that something anyone would want on purpose.

(For reference. Another such historical accident is building all of GCC twice, once without libc, once with. That is obviously overkill, and people rigged up things like https://github.com/richfelker/musl-cross-make to avoid it. LLVM doesn't engage in such folly, making it easy to build compiler-rt and clang separately. I think if we do disentangle this circle dep, the old "use libc headers" will be looked back upon in hindsight as just as silly.)

BTW, this sort of disentangling would also be good for Rust. Their "compiler builtins" package with bits of compiler-rt when doing a freestanding or WASM (without WASI at least) build ought not to depend on any libc, not even a newlib.


Getting down to brass tacks:

  • Stuff like emutls feels to be like clearly a "pseudo-builtins", it fallback logic in non-trivial software.

  • The builtins that use getauxval are a bit trickier. Can we skip them entirely in the first "true builtins" step? Unclear. And so you want to use features depending on hardware detection in freestanding code? Not sure what the right solution is, but ideally there is some interface that is amendable to OS-leveraging and freestanding approach, and it is more defined than "whatever in libc we happen to use".

@efriedma-quic
Copy link
Collaborator

I don't think we can make the builtins build completely independent of libc headers: the stuff that's libc dependent is truly platform-dependent, and there's no real way to work around that. And we can't modify the set of APIs exposed by the builtins library.

So to make this work, we need a target to build the stripped-down builtins library (with a different name, so it doesn't get confused with the real one). Then we need a clang flag to tell the compiler/linker to use the stripped-down builtins library. Then once you have both of those, you can build libc against the stripped-down builtins library. Then once you have libc, you can build everything else normally.

This is something we can do, I guess, but it seems like overkill.

I don't see any other reasonable path besides just maintaining the status quo. (The "install libc headers" thing is a bit awkward, but it's worked for everyone for a long time.)

@efriedma-quic
Copy link
Collaborator

For the feature detection stuff specifically, libc implementations currently roll their own CPU detection, instead of using the attributes provided by the compiler. That probably won't change.

@Ericson2314
Copy link
Member Author

@efriedma-quic

So to make this work....

That sounds good to me.

but it seems like overkill.

I don't know how to argue this, but it just...doens't to me?

Here's another benefit: right now LLVM libc also has a convoluted build building compiler-rt. With this approach, we can disenantagle that and allow each component to be built too. Much better!

the stuff that's libc dependent is truly platform-dependent, and there's no real way to work around that.

No disagreement from me. The only assumption I want to get rid of is platform-specific stuff always libc. And even that is more a matter of perspective than actually changing code. (Ideally I would do some refactors but it is not necessary.)

Really the only change is to disable/enable things to avoid building stuff twice and then link it all together.

@michaelrj-google
Copy link
Contributor

I'm happy to have a simpler way to perform a hermetic LLVM-libc build with fresh compiler-rt. I will say our header generation is pretty much completely separate from the rest of our build, so doing that first isn't a problem (we already do that for scudo, see the COMPILER_RT_BUILD_SCUDO_STANDALONE_WITH_LLVM_LIBC cmake flag).

For platform specific stuff, I think both libc and compiler-rt need to have access to it. One way to resolve this dependency would be re-using the mechanism from Project Hand-in-Hand to share the libc-internal pieces. That way we could have a common implementation, but also avoid a build-ordering dependency. The shared code would need to be header only so it could be built as part of the compiler-rt build, but a lot of our OS specifics already are.

@Ericson2314
Copy link
Member Author

Ericson2314 commented Feb 14, 2025

@michaelrj-google Oh, thank for linking Project Hand-in-Hand, that's a great comparison!

Both fundamentally relate to the libc dual mandate being "the preferred OS interface" and "the C standard library" being fundamentally unworkable IMO. And so the right way to structure the internals doesn't necessary correspond to the traditional way of dividing up the interfaces that people have come to expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants