Skip to content

Conversation

@dev-m03
Copy link

@dev-m03 dev-m03 commented Jan 25, 2026

Pull Request description

This PR fixes a type inconsistency in the LLVM IR builder related to string handling.

Previously, a struct-based STRING_TYPE ({ i32, i8* }) existed in the type system, but all actual string operations in the codebase (concatenation, comparison, casts, puts, function arguments and returns) were already using plain i8*. This created a mismatch between the declared type model and the real LLVM IR being generated.

This change removes the unused struct-based string type and standardizes the string representation across the entire IR builder to i8* (C-style null-terminated char pointer), aligning the type system with the existing implementation.

No logic or behavior was changed — only type consistency was corrected.

@dev-m03
Copy link
Author

dev-m03 commented Jan 25, 2026

@yuvimittal
The main/linter CI failure appears unrelated to this change. The job fails during dependency installation with a Python 3.13 / pip metadata error (Bad metadata entry 'version' while uninstalling sniffio). This occurs before tests run and seems to be an environment issue rather than a code issue.
Please check if this is due to CI infra issue or not?

@dev-m03 dev-m03 marked this pull request as ready for review January 25, 2026 21:17
@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • High risk: get_data_type("string"/"utf8string") now returns i8* instead of the previous length+ptr struct. This is a breaking ABI/behavior change and can introduce correctness and safety bugs (null-termination assumptions, OOB reads, O(n) length scans). If intentional, please document and ensure all producers/consumers were migrated; otherwise, restore the struct or gate behind a feature flag (L.132).

  • Material bug risk: constructing ir.IntType(8).as_pointer() ad hoc risks subtle type mismatches elsewhere and makes equality checks brittle. Define and reuse a single canonical C-string type.
    Suggested change (add field) (L.90):
    def CSTRING_TYPE(self) -> ir.types.Type:
    """Canonical C string type (char*)"""
    Suggested init (L.212):
    def initialize(self) -> None:
    """Initialize LLVM types"""
    self._llvm.CSTRING_TYPE = ir.IntType(8).as_pointer()
    Then use self._llvm.CSTRING_TYPE in get_data_type and all string helper signatures (L.132, L.1614, L.1630, L.1645, L.1659).

  • Robustness: when reusing existing declarations via get_global, validate the signature matches the expected FunctionType to avoid silent mismatches.
    Suggested helper (place near the string helpers) (L.1608):
    def _get_or_declare(self, name: str, ftype: ir.FunctionType) -> ir.Function:
    """Return function with exact expected signature, or raise on mismatch"""
    existing = self._llvm.module.globals.get(name)
    if existing:
    if not isinstance(existing, ir.Function) or existing.function_type != ftype:
    raise ValueError(f"Mismatched declaration for {name}")
    return existing
    return ir.Function(self._llvm.module, ftype, name)

  • Minor but correctness-affecting: In cast handling, avoid constructing new IntType each time; use the canonical CSTRING_TYPE for the pointee check to prevent accidental mismatches (L.2204).
    Suggested change (L.2204):
    def _is_cstring(self, t: ir.types.Type) -> bool:
    """Check if type is a canonical C string (char*)"""
    return isinstance(t, ir.PointerType) and t.pointee == self._llvm.CSTRING_TYPE.pointee


@dev-m03 dev-m03 marked this pull request as draft January 25, 2026 21:19
@dev-m03
Copy link
Author

dev-m03 commented Jan 29, 2026

@xmnlab , here you can see , as discussed earlier.
Thanks

@xmnlab xmnlab marked this pull request as ready for review January 29, 2026 15:43
@github-actions
Copy link

OSL ChatGPT Reviewer

NOTE: This is generated by an AI program, so some comments may not make sense.

src/irx/builders/llvmliteir.py

  • This is an ABI/behavior change: “string” and “utf8string” move from a length-carrying struct to raw i8*. All call sites and the runtime implementations for string_concat/length/equals/substring must be updated accordingly or you’ll get UB due to mismatched prototypes/representation. If backward compatibility is needed, keep the old struct type for utf8string or gate the change behind a versioned set of intrinsics.

  • Real performance regression: losing the stored length makes length/concat substring O(n) scans on every call. If large strings are common, consider retaining a length-carrying representation (at least for utf8string), or document the trade-off.

  • Consistency/correctness risk: you now construct fresh i8* types in multiple places. If any identity-based comparisons exist elsewhere, this can break subtlely. Define a canonical C-string type once and reuse it.

Suggested changes:

  • Add a canonical pointer-to-i8 type and use it everywhere instead of reconstructing i8* literals.

    (L.95)
    def CSTR_TYPE(self) -> ir.types.Type:
    """Canonical i8* string type."""
    return self._llvm.CSTR_TYPE

    (L.212)
    def _init_cstr_type(self) -> None:
    """Initialize canonical i8* string type."""
    self._llvm.CSTR_TYPE = ir.IntType(8).as_pointer()

    (L.134-141)
    def get_data_type(self, type_name: str) -> ir.types.Type:
    """Map high-level type name to LLVM type."""
    if type_name == "string":
    return self._llvm.CSTR_TYPE
    elif type_name == "stringascii":
    return self._llvm.CSTR_TYPE
    elif type_name == "utf8string":
    return self._llvm.CSTR_TYPE

    (L.1614, L.1630, L.1645, L.1659)
    def _use_cstr_type_in_string_funcs(self) -> None:
    """Use canonical i8* in string intrinsics."""
    # Replace ir.IntType(8).as_pointer() with self._llvm.CSTR_TYPE in all string function signatures.

  • Also ensure all string producers now emit null-terminated buffers; otherwise length/concat/equals semantics will be incorrect or unsafe.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant