Skip to content

Conversation

@omsherikar
Copy link
Contributor

@omsherikar omsherikar commented Nov 6, 2025

Pull Request description

This PR fixes incorrect SIZE_T_TYPE initialization and missing target configuration in the LLVM module, which could cause miscompilation on 32-bit or cross-compilation targets.

Issues fixed:

  1. Double initialization of SIZE_T_TYPE: The type was first initialized from host sizes via ctypes, then forcibly overridden to 64-bit, causing mismatches on non-64-bit targets.
  2. Missing target triple and data layout: The LLVM module lacked target triple and data layout information, preventing correct ABI/size calculations for the target architecture.

Changes made:

  • Set module.triple from target machine to ensure correct target architecture
  • Set module.data_layout from target machine to ensure correct pointer sizes and ABI
  • Removed forced 64-bit override of SIZE_T_TYPE, allowing it to use the host-initialized value (which is correct for the target)
  • Added fallback for module.triple assignment in case attribute access fails

Why this matters:

  • On 32-bit systems, size_t should be 32-bit, not 64-bit
  • Cross-compilation requires correct target triple and data layout
  • Incorrect sizes can cause ABI mismatches and runtime failures
    solves Correct native sizes & module target info #132

How to test these changes

  • Run the full test suite to ensure no regressions:

    pytest -v
  • Verify string operations still work (they use SIZE_T_TYPE):

    pytest tests/test_string.py -v
  • Test malloc/snprintf operations (they use SIZE_T_TYPE):

    pytest tests/test_cast.py -v
  • All 112 tests should pass, confirming the fix doesn't break existing functionality

Pull Request checklists

This PR is a:

  • bug-fix
  • new feature
  • maintenance

About this PR:

  • it includes tests. (existing tests verify the fix)
  • the tests are executed on CI.
  • the tests generate log file(s) (path). N/A
  • pre-commit hooks were executed locally.
  • this PR requires a project documentation update.

Author's checklist:

  • I have reviewed the changes and it contains no misspelling.
  • The code is well commented, especially in the parts that contain more complexity.
  • New and old tests passed locally.

Additional information

Code location:

  • File: src/irx/builders/llvmliteir.py
  • Lines 131-137: Target triple and data layout setup
  • Lines 204-206: Removed forced 64-bit SIZE_T_TYPE override

Before:

# Line 199 (removed):
self._llvm.SIZE_T_TYPE = ir.IntType(64)  # ❌ Forced 64-bit, ignores host/target

# Module had no triple or data_layout set

After:

# Lines 131-137 (added):
# Attach target triple and data layout to the module for correct sizes/ABI
try:
    self._llvm.module.triple = self.target.triple
except Exception:
    # Fallback to the default triple if attribute not available
    self._llvm.module.triple = llvm.get_default_triple()
self._llvm.module.data_layout = str(self.target_machine.target_data)

# Lines 204-206 (changed):
# SIZE_T_TYPE already initialized based on host; do not override with a
# fixed width here to avoid mismatches on non-64-bit targets.

Test results:

  • ✅ All 112 tests pass
  • ✅ No regressions in existing functionality
  • ✅ SIZE_T_TYPE now correctly reflects target architecture

Reviewer's checklist

Copy and paste this template for your review's note:

## Reviewer's Checklist

- [ ] I managed to reproduce the problem locally from the `main` branch
- [ ] I managed to test the new changes locally
- [ ] I confirm that the issues mentioned were fixed/resolved .

@github-actions
Copy link

github-actions bot commented Nov 6, 2025

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • High risk of triple/data layout mismatch: falling back to llvm.get_default_triple() while keeping data_layout from self.target_machine can produce invalid IR for cross-target builds. Use the target_machine as the single source of truth for both. Replace the try/except with:
    self._llvm.module.triple = self.target_machine.triple # type: ignore[attr-defined]
    self._llvm.module.data_layout = str(self.target_machine.target_data)
    (L.135)

  • Potential undefined SIZE_T_TYPE: removing the 64-bit default is good, but ensure SIZE_T_TYPE is initialized before any use. Safe portable init:
    if not hasattr(self._llvm, "SIZE_T_TYPE"):
    bits: int = self.target_machine.target_data.pointer_size * 8
    self._llvm.SIZE_T_TYPE = ir.IntType(bits)
    (L.206)


@github-actions
Copy link

github-actions bot commented Nov 6, 2025

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness: Deriving SIZE_T_TYPE via regex on the data layout is brittle and may pick a non-default address space or the wrong token ordering. Use TargetData.pointer_size (bytes) for address space 0 instead of parsing. Also set POINTER_BITS from the target, not the host, to avoid cross-compilation mismatches.
    Suggested change (L.133):
    def _set_size_t_from_target(self) -> None:
    """Initialize SIZE_T_TYPE from target data"""
    td = self.target_machine.target_data
    bits: int = 8 * getattr(td, "pointer_size", td.pointer_size(0)) # bytes -> bits
    self._llvm.SIZE_T_TYPE = ir.IntType(bits)

    And replace the if-block with:
    if getattr(self._llvm, "SIZE_T_TYPE", None) is None:
    self._set_size_t_from_target()

  • Correctness: POINTER_BITS is still initialized from the host (ctypes) and can diverge from the selected target. Initialize it from target data to keep it consistent with SIZE_T_TYPE.
    Suggested change (L.147):
    def _init_native_size_types(self) -> None:
    """Initialize pointer/size_t types from target"""
    td = self.target_machine.target_data
    ptr_bits: int = 8 * getattr(td, "pointer_size", td.pointer_size(0))
    self._llvm.POINTER_BITS = ptr_bits
    self._llvm.SIZE_T_TYPE = None

  • Safety: Accessing self._llvm.SIZE_T_TYPE directly in the None-check can raise if the attribute isn’t defined yet. Use getattr to guard.
    Suggested change (L.133):
    if getattr(self._llvm, "SIZE_T_TYPE", None) is None:
    ...


@omsherikar
Copy link
Contributor Author

@yuvimittal @xmnlab please have a look

@github-actions
Copy link

github-actions bot commented Nov 8, 2025

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness: Parsing the data layout with r"p(?:\d+)?:(\d+)" may pick a non-default address space (e.g., p270) and set SIZE_T_TYPE incorrectly. Also, falling back to ctypes.c_size_t ties you to the host, breaking cross-target builds. Prefer the TargetData API and explicitly read the default AS (0). Suggest:

    • Replace the regex block with a helper that uses TargetData first, then a safe regex for p0 or p:, and only lastly host fallback. Also set POINTER_BITS from the same source to avoid host/target mismatches. (L.133)

    def _get_default_pointer_bits(self) -> int:
    """Return target default address-space pointer width in bits."""
    td = self.target_machine.target_data
    ptr_bytes = getattr(td, "pointer_size", None)
    if isinstance(ptr_bytes, int) and ptr_bytes > 0:
    return ptr_bytes * 8
    m = re.search(r"(?:^|-)p(?:0)?:([0-9]+)", str(td))
    if m:
    return int(m.group(1))
    return ctypes.sizeof(ctypes.c_void_p) * 8

    In init after setting triple/data_layout

    ptr_bits: int = self._get_default_pointer_bits()
    self._llvm.SIZE_T_TYPE = ir.IntType(ptr_bits)
    self._llvm.POINTER_BITS = ptr_bits

  • Consistency: _init_native_size_types sets POINTER_BITS from the host. After you switch to target-derived ptr_bits as above, ensure you either remove or overwrite the host-derived POINTER_BITS so it cannot leak into codepaths that run before init finishes. Consider asserting SIZE_T_TYPE is set before any use. (L.118)


@omsherikar
Copy link
Contributor Author

@yuvimittal @xmnlab Please have a look

@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness: POINTER_BITS and SIZE_T_TYPE can mismatch the target when cross-compiling. _init_native_size_types uses host ctypes, and _get_size_t_type_from_triple relies on brittle substring checks (e.g., x86_64-…-gnux32, wasm32, riscv32/64, s390x). This can produce wrong IR types and ABI mismatches. Please derive both from LLVM TargetData.

    Suggested change (L.138-L.143):
    def _init_native_size_types(self) -> None:
    """Initialize pointer/size_t types from LLVM TargetData (target, not host)."""
    ptr_bits: int = self.target_machine.target_data.pointer_size * 8
    self._llvm.POINTER_BITS = ptr_bits
    self._llvm.SIZE_T_TYPE = ir.IntType(ptr_bits)

    Suggested change (replace method) (L.144-L.167):
    def _get_size_t_type_from_target_data(self) -> ir.IntType:
    """Determine size_t type from LLVM TargetData pointer size."""
    ptr_bits: int = self.target_machine.target_data.pointer_size * 8
    return ir.IntType(ptr_bits)

    And call it instead of the triple heuristic (L.144):
    if self._llvm.SIZE_T_TYPE is None:
    self._llvm.SIZE_T_TYPE = self._get_size_t_type_from_target_data()

  • Ordering: Ensure _init_native_size_types runs after module.triple and module.data_layout are set, or make _init_native_size_types use target_machine.target_data as above so it’s safe regardless of order. Otherwise POINTER_BITS remains host-sized. (Around L.128-L.136)


@omsherikar omsherikar closed this Nov 12, 2025
@omsherikar omsherikar reopened this Nov 12, 2025
@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness: SIZE_T_TYPE is now target-derived but POINTER_BITS still comes from host ctypes, which will silently mismatch on cross-target builds (e.g., building x86_64 target on a 32-bit host, or vice versa). Derive both from target_machine.target_data to keep them consistent. Suggest adding a helper and calling it right after setting triple/data_layout. (L.132)
    def _sync_target_sizes(self) -> None:
    """Sync pointer and size_t sizes to target data layout."""
    td = self.target_machine.target_data
    self._llvm.POINTER_BITS = td.pointer_size * 8
    self._llvm.SIZE_T_TYPE = ir.IntType(self._llvm.POINTER_BITS)

    Call after setting module triple/data_layout:

    self._sync_target_sizes()

  • Correctness: The arch-string heuristic in _get_size_t_type_from_triple can mis-detect ABIs (e.g., x32: x86_64 triple with 32-bit pointers) and misses common targets (riscv64, s390x, wasm32/64). It also falls back to host ctypes, which breaks cross-compilation. Replace with target-data-derived size. (L.147)
    def _get_size_t_type_from_triple(self) -> ir.IntType:
    """Determine size_t type from target data layout."""
    bits: int = self.target_machine.target_data.pointer_size * 8
    return ir.IntType(bits)


@omsherikar omsherikar force-pushed the fix/target-layout-size_t branch from 3e5333c to 0a6848f Compare November 12, 2025 17:28
@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Potential correctness bug: POINTER_BITS is derived from the host (ctypes.c_void_p) while you now set module triple/data layout from the target. This can miscompile under cross-compilation (pointer math, GEP, and size_t-related code can diverge). Compute pointer width from target_machine.target_data instead of the host. (L.176)

Suggested change:
def _init_native_size_types(self) -> None:
"""Initialize pointer/size_t types from target."""
self._llvm.POINTER_BITS = None
self._llvm.SIZE_T_TYPE = None

And right after setting module.triple and module.data_layout in init (same block):
def _sync_pointer_bits_from_target(self) -> None:
"""Set POINTER_BITS from target data layout."""
td = self.target_machine.target_data
ptr_bytes: int = getattr(td, "pointer_size", None) or (td.get_pointer_size() if hasattr(td, "get_pointer_size") else 0)
if not ptr_bytes:
raise RuntimeError("Unable to determine target pointer size from target data")
self._llvm.POINTER_BITS = ptr_bytes * 8

Call _sync_pointer_bits_from_target() after assigning data_layout. (L.176)

  • Portability/correctness: _get_size_t_type_from_triple relies on substring heuristics and falls back to host ctypes on unknown triples (e.g., riscv64, s390x, wasm32/64), which will be wrong when cross-compiling. Derive size_t from target data layout pointer size instead. (L.189)

Suggested change:
def _get_size_t_type_from_triple(self) -> ir.IntType:
"""Return size_t type width from target data layout."""
td = self.target_machine.target_data
ptr_bytes: int = getattr(td, "pointer_size", None) or (td.get_pointer_size() if hasattr(td, "get_pointer_size") else 0)
if not ptr_bytes:
raise RuntimeError(f"Unable to determine pointer size for triple: {self.target_machine.triple}")
return ir.IntType(ptr_bytes * 8)


@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness: size_t detection via triple string matching is brittle and wrong for several targets (e.g., x86_64-gnux32, riscv{32,64}, wasm{32,64}) and the fallback to ctypes uses host-size, breaking cross-compilation. Use target_data instead of heuristics/host. Replace the helper and call site:

    • (L.181) replace call to _get_size_t_type_from_triple() with _get_size_t_type_from_target_data()
    • (L.195) replace the helper with:
      def _get_size_t_type_from_target_data(self) -> ir.IntType:
      """Derive size_t from target data pointer size"""
      size_bits: int = self.target_machine.target_data.pointer_size * 8
      return ir.IntType(size_bits)
  • Correctness: POINTER_BITS currently comes from ctypes (host), which can mismatch the target in cross-compiling scenarios. Set it from target_data after target_machine is created.

    • (L.176) add after setting module.triple/data_layout:
      def _init_target_pointer_bits(self) -> None:
      """Initialize pointer width from target data"""
      self._llvm.POINTER_BITS = self.target_machine.target_data.pointer_size * 8
    • (L.176) call _init_target_pointer_bits() here and change _init_native_size_types to set POINTER_BITS = None (as you did for SIZE_T_TYPE) to avoid host-derived defaults.

tests/test_llvmlite_helpers.py

  • Unused imports will likely fail lint/CI:

    • Remove unused symbol "patch" from unittest.mock import (L.4).
    • Remove unused "DoubleType, FloatType" import (L.14).
  • Tests are coupling to a private API (_get_size_t_type_from_triple). This is brittle and may break with internal refactors. Prefer asserting via a public accessor or helper if available (e.g., expose a size_t_type getter) to reduce test fragility.


@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • High risk: Deriving size_t via string-matching the target triple is brittle and wrong for several valid targets/ABIs (e.g., x86_64-*-gnux32 where size_t is 32, riscv32/64, s390x, wasm32/64). Also, you now mix host-derived POINTER_BITS (ctypes) with target-derived SIZE_T_TYPE, which can desync under cross-compilation. Use LLVM TargetData instead of triple parsing and set both consistently. Suggested change (L.189):

    def _init_target_size_types(self) -> None:
    """Initialize pointer and size_t types from the LLVM TargetData."""
    td = self.target_machine.target_data
    pointer_bits: int = td.pointer_size * 8
    self._llvm.POINTER_BITS = pointer_bits
    self._llvm.SIZE_T_TYPE = ir.IntType(pointer_bits)

    Then call this right after setting module.triple and module.data_layout, and remove _get_size_t_type_from_triple()/the None-guard. This fixes x32 and unsupported archs and avoids host/target mismatches. (Apply call at the new init site where you currently check SIZE_T_TYPE is None.)

  • Ordering bug risk: If _init_native_size_types() is not guaranteed to run before the SIZE_T_TYPE None-check in init, the previous global value may persist and silently mismatch the target. Making initialization unconditional via TargetData as above removes this class of bugs.


tests/test_llvmlite_helpers.py

LGTM!


@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness: Deriving SIZE_T_TYPE from triple substrings and falling back to ctypes can mis-size size_t for cross-targets (e.g., wasm32, riscv64, s390x), and differs from the module’s data_layout. Use TargetData instead of heuristics/host ctypes. Also, POINTER_BITS is still taken from host ctypes, which will be wrong when cross-compiling. (L.189-216, L.193)
    Suggested change:
    def _get_size_t_type_from_target_data(self) -> ir.IntType:
    """Return size_t as an integer type based on target data pointer size."""
    bits: int = self.target_machine.target_data.pointer_size * 8
    return ir.IntType(bits)

    And update usages:

    • In init: self._llvm.SIZE_T_TYPE = self._get_size_t_type_from_target_data() (remove the None guard) (L.176-182)
    • In _init_native_size_types: self._llvm.POINTER_BITS = self.target_machine.target_data.pointer_size * 8 (L.193)
  • Correctness: The global guard if self._llvm.SIZE_T_TYPE is None means the first builder fixes SIZE_T_TYPE for the entire process, causing incorrect behavior when constructing builders for different targets later. Set it unconditionally per builder, or make it instance/module-scoped per target. (L.176-182)
    Suggested change:
    def _set_size_types_for_target(self) -> None:
    """Initialize pointer and size_t types from target data for this builder."""
    bits: int = self.target_machine.target_data.pointer_size * 8
    self._llvm.POINTER_BITS = bits
    self._llvm.SIZE_T_TYPE = ir.IntType(bits)

    And call it after setting module.triple/data_layout in init.


tests/test_llvmlite_helpers.py

LGTM!


Copy link
Contributor

@xmnlab xmnlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks, appreciate it, @omsherikar

Copilot AI review requested due to automatic review settings January 22, 2026 18:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness: POINTER_BITS is still derived from the host and can mismatch the target during cross-compilation. Set it from the target data right after configuring the module triple/layout (L.176):
    self._llvm.POINTER_BITS = self.target_machine.target_data.pointer_size * 8

  • Correctness: _get_size_t_type_from_triple relies on substring heuristics and falls back to ctypes (host), which is wrong for cross-targets (e.g., riscv64, s390x, wasm32/64). Derive size_t from the target data pointer size instead (L.196):
    def _get_size_t_type_from_triple(self) -> ir.IntType:
    """Return size_t type width from target data pointer size."""
    ptr_bits: int = self.target_machine.target_data.pointer_size * 8
    return ir.IntType(ptr_bits)


tests/test_llvmlite_helpers.py

  • The fallback assertion is too lax and can mask real bugs or platform-dependent behavior. Make it deterministic by asserting against the host pointer size (use a small helper) instead of allowing both 32 and 64 (L.115):
def _host_ptr_bits() -> int:
    """Return host pointer size in bits."""
    import struct
    return struct.calcsize("P") * 8

Then replace the final assert in test_get_size_t_type_from_triple_fallback (L.121):

assert size_t_ty.width == _host_ptr_bits()
  • The vector promotion tests only validate element type, not lane count. Add an assertion to ensure the vector length is correct; otherwise, a scalar or wrong-width vector could slip through (L.144, L.165):
assert result.type.count == 2
  • Minor robustness: when mocking target_machine, set a spec to catch typos and ensure only expected attributes are used (L.103, L.113):
mock_tm = Mock(spec_set=["triple"])

@omsherikar
Copy link
Contributor Author

LGTM! thanks, appreciate it, @omsherikar

Thanks @xmnlab

@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • Correctness: SIZE_T_TYPE derivation via string heuristics is brittle and will mis-detect several common targets (e.g., powerpc64le, riscv64, s390x), and falls back to host size on unknown triples, which breaks cross-targeting. Compute it from the target data pointer size instead. Also, POINTER_BITS is still derived from host ctypes, which can mismatch the module triple/layout. Use target data instead. (L.190-L.221)

    Suggested change:
    def _get_size_t_type_from_triple(self) -> ir.IntType:
    """Return size_t width from target data pointer size."""
    td = self.target_machine.target_data
    try:
    ptr_size = td.pointer_size # bytes
    except AttributeError:
    ptr_size = td.get_pointer_size(0) # address space 0
    return ir.IntType(ptr_size * 8)

  • Correctness: If self._llvm is shared across builder instances, guarding SIZE_T_TYPE initialization with “if None” risks stale values when switching targets. Recompute unconditionally after setting module triple/data layout. (L.178)

    Suggested change:
    def init(self) -> None:
    """..."""
    ...
    self._llvm.module.triple = self.target_machine.triple
    self._llvm.module.data_layout = str(self.target_machine.target_data)
    self._llvm.SIZE_T_TYPE = self._get_size_t_type_from_triple()

  • Correctness: Align POINTER_BITS with target, not host. (L.196)

    Suggested change:
    def _init_native_size_types(self) -> None:
    """Initialize pointer/size_t types from target."""
    td = self.target_machine.target_data
    try:
    ptr_size = td.pointer_size
    except AttributeError:
    ptr_size = td.get_pointer_size(0)
    self._llvm.POINTER_BITS = ptr_size * 8
    self._llvm.SIZE_T_TYPE = None


tests/test_llvmlite_helpers.py

  • Using Mock() without a spec can silently mask attribute mistakes in the implementation, reducing the test’s ability to catch regressions. Constrain the mock to only expose triple:

    • Change to mock_tm = Mock(spec=["triple"]) (L.104)
    • Change to mock_tm = Mock(spec=["triple"]) (L.116)
  • These tests reach into private internals (visitor._get_size_t_type_from_triple and visitor._llvm.ir_builder). This couples tests to implementation details and will break easily on refactors. Consider exposing a small public helper on LLVMLiteIRVisitor (e.g., get_size_t_type()) and using public builder accessors to reduce brittleness.


@xmnlab
Copy link
Contributor

xmnlab commented Jan 22, 2026

@omsherikar .. I will take a time to check the review here #134 (comment)

maybe there is something interesting there ...

@omsherikar
Copy link
Contributor Author

@omsherikar .. I will take a time to check the review here #134 (comment)

maybe there is something interesting there ...

Okay @xmnlab I will also have a look in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants