Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy files extracted post LTO compilation might reference other lazy bitcode files, leading to incorrect absolute defined symbols #127284

Open
arichardson opened this issue Feb 15, 2025 · 3 comments
Labels
lld:ELF LTO Link time optimization (regular/full LTO or ThinLTO) miscompilation

Comments

@arichardson
Copy link
Member

I was trying to build an arm32 binary that uses 64-bit division a LTO build of compiler-rt.

The 64-bit division function in compiler-rt (__aeabi_ldivmod) is written in assembly and calls the C function __divmoddi4, which works fine normally. However, when building with LTO the call inside __aeabi_ldivmod is replaced with a jump to address zero, which then crashes my program.

__aeabi_ldivmod dump:

Disassembly of section .text:

00000000 <__aeabi_ldivmod>:
       0: e92d4040      push    {r6, lr}
       4: e24dd010      sub     sp, sp, #16
       8: e28d6008      add     r6, sp, #8
       c: e58d6000      str     r6, [sp]
      10: ebfffffe      bl      0x10 <__aeabi_ldivmod+0x10> @ imm = #-0x8
                        00000010:  R_ARM_CALL   __divmoddi4
      14: e59d2008      ldr     r2, [sp, #0x8]
      18: e59d300c      ldr     r3, [sp, #0xc]
      1c: e28dd010      add     sp, sp, #16
      20: e8bd8040      pop     {r6, pc}

I discussed this issue with @ilovepi and it appears that one problem here is that the call to __aeabi_ldivmod is generated post-LTO and the __divmoddi4 symbol is marked as not needed before final codegen.
However, it does seem like lld should be reporting an error for the missing __divmoddi4 instead of using address zero (it is not marked as weak, so that is invalid).
When building with -pie instead of position dependent static linking, I do get an error:

ld.lld: error: relocation R_ARM_CALL cannot refer to absolute symbol: __divmoddi4
>>> defined in divmoddi4.bc
>>> referenced by aeabi_ldivmod.o:(__aeabi_ldivmod)

This suggests __divmoddi4 was replaced with and absolute zero symbol.
I have created a minimized test case which I will upload as a PR shortly.

@arichardson arichardson added lld:ELF LTO Link time optimization (regular/full LTO or ThinLTO) labels Feb 15, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 15, 2025

@llvm/issue-subscribers-lld-elf

Author: Alexander Richardson (arichardson)

I was trying to build an arm32 binary that uses 64-bit division a LTO build of compiler-rt.

The 64-bit division function in compiler-rt (__aeabi_ldivmod) is written in assembly and calls the C function __divmoddi4, which works fine normally. However, when building with LTO the call inside __aeabi_ldivmod is replaced with a jump to address zero, which then crashes my program.

__aeabi_ldivmod dump:

Disassembly of section .text:

00000000 &lt;__aeabi_ldivmod&gt;:
       0: e92d4040      push    {r6, lr}
       4: e24dd010      sub     sp, sp, #<!-- -->16
       8: e28d6008      add     r6, sp, #<!-- -->8
       c: e58d6000      str     r6, [sp]
      10: ebfffffe      bl      0x10 &lt;__aeabi_ldivmod+0x10&gt; @ imm = #-0x8
                        00000010:  R_ARM_CALL   __divmoddi4
      14: e59d2008      ldr     r2, [sp, #<!-- -->0x8]
      18: e59d300c      ldr     r3, [sp, #<!-- -->0xc]
      1c: e28dd010      add     sp, sp, #<!-- -->16
      20: e8bd8040      pop     {r6, pc}

I discussed this issue with @ilovepi and it appears that one problem here is that the call to __aeabi_ldivmod is generated post-LTO and the __divmoddi4 symbol is marked as not needed before final codegen.
However, it does seem like lld should be reporting an error for the missing __divmoddi4 instead of using address zero (it is not marked as weak, so that is invalid).
When building with -pie instead of position dependent static linking, I do get an error:

ld.lld: error: relocation R_ARM_CALL cannot refer to absolute symbol: __divmoddi4
&gt;&gt;&gt; defined in divmoddi4.bc
&gt;&gt;&gt; referenced by aeabi_ldivmod.o:(__aeabi_ldivmod)

This suggests __divmoddi4 was replaced with and absolute zero symbol.
I have created a minimized test case which I will upload as a PR shortly.

@frobtech
Copy link
Contributor

I would guess that this is the same bug that @ilovepi and @mysterymath have been chasing plus another bug. That is, the fact that the proper definition of __divmoddi4 is not coming out of the LTO codegen seems llike the same basic issue as the other case--albeit probably internally a bit different, being a backend-generated vs middle-end-generated libcall perhaps. But then my guess is that there is an additional bug in LLD perhaps specific to the arm32 backend that is causing that undefined symbol (that's only undefined because of the LTO bug) to turn into a SHN_ABS symbol.

@MaskRay
Copy link
Member

MaskRay commented Feb 15, 2025

Added some notes to my https://gist.github.com/MaskRay/24f4e2eed208b9d8b0a3752575a665d4 This is another variant of the "Symbols unknown to the IR symbol table" problem:

Lazy files extracted post LTO compilation might reference other lazy files. Referenced relocatable files are extracted and everything works as intended. However, if the referenced lazy file is a bitcode file, no further LTO compilation occurs. lld currently treats any symbols from that bitcode file as absolute, which leads to a "refer to absolute symbol" error in PIC links and leads to silently broken output. For example, lazy aeabi_ldivmod.o post LTO extraction might call __divmoddi4 defined in an unextracted lazy bitcode file (#127284).

@MaskRay MaskRay changed the title [ELF][LTO] Invalid relocations when libcalls are synthesized by ISel [ELF][LTO] Lazy files extracted post LTO compilation might reference other lazy bitcode files, leading to incorrect absolute defined symbols Feb 15, 2025
@MaskRay MaskRay changed the title [ELF][LTO] Lazy files extracted post LTO compilation might reference other lazy bitcode files, leading to incorrect absolute defined symbols Lazy files extracted post LTO compilation might reference other lazy bitcode files, leading to incorrect absolute defined symbols Feb 15, 2025
arichardson added a commit that referenced this issue Feb 18, 2025
…calls

This can happen when using a LTO build of compiler-rt for ARM and the
program uses 64-bit division.
The 64-bit division function in compiler-rt (__aeabi_ldivmod) is written
in assembly and calls the C function __divmoddi4, which works fine in
non-LTO links. However, when building with LTO the call inside
__aeabi_ldivmod is replaced with a jump to address zero, which then
crashes the program.

Building with -pie generates an error instead of a jump to address zero,
and surprisingly just declaring the __aeabi_ldivmod function (but not
calling it) in the input IR also avoids this issue.

Reported as #127284

Co-authored-by: Fangrui Song <i@maskray.me>

Reviewed By: MaskRay

Pull Request: #127286
github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Feb 18, 2025
…st runtime calls

This can happen when using a LTO build of compiler-rt for ARM and the
program uses 64-bit division.
The 64-bit division function in compiler-rt (__aeabi_ldivmod) is written
in assembly and calls the C function __divmoddi4, which works fine in
non-LTO links. However, when building with LTO the call inside
__aeabi_ldivmod is replaced with a jump to address zero, which then
crashes the program.

Building with -pie generates an error instead of a jump to address zero,
and surprisingly just declaring the __aeabi_ldivmod function (but not
calling it) in the input IR also avoids this issue.

Reported as llvm/llvm-project#127284

Co-authored-by: Fangrui Song <i@maskray.me>

Reviewed By: MaskRay

Pull Request: llvm/llvm-project#127286
wldfngrs pushed a commit to wldfngrs/llvm-project that referenced this issue Feb 19, 2025
…calls

This can happen when using a LTO build of compiler-rt for ARM and the
program uses 64-bit division.
The 64-bit division function in compiler-rt (__aeabi_ldivmod) is written
in assembly and calls the C function __divmoddi4, which works fine in
non-LTO links. However, when building with LTO the call inside
__aeabi_ldivmod is replaced with a jump to address zero, which then
crashes the program.

Building with -pie generates an error instead of a jump to address zero,
and surprisingly just declaring the __aeabi_ldivmod function (but not
calling it) in the input IR also avoids this issue.

Reported as llvm#127284

Co-authored-by: Fangrui Song <i@maskray.me>

Reviewed By: MaskRay

Pull Request: llvm#127286
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lld:ELF LTO Link time optimization (regular/full LTO or ThinLTO) miscompilation
Projects
None yet
Development

No branches or pull requests

5 participants