Skip to content

Project Ideas

Yuvi Mittal edited this page Jan 28, 2026 · 7 revisions

IRx

Project Idea 1: Integrate IRx with Apache Arrow (C Data Interface)

Abstract

This project integrates IRx with Apache Arrow via the Arrow C Data and C Stream Interfaces, enabling IRx to lower ASTx values into Arrow-compatible columnar memory layouts. The integration provides zero-copy, language-agnostic interoperability with the Arrow ecosystem (PyArrow, Rust Arrow, DuckDB, etc.) without introducing a dependency on Arrow C++ libraries.

Motivation

IRx currently lowers ASTx programs into scalar LLVM-IR values, which limits interoperability with modern data systems. Apache Arrow has become the de facto standard for in-memory columnar data representation across languages and runtimes.

By integrating Arrow at the IR level:

  • IRx can interoperate with Python (PyArrow), Rust, C/C++, and JVM ecosystems.
  • Data can be shared zero-copy across components.
  • IRx becomes suitable for analytics, vectorized execution.

This project focuses on ABI-level integration via the Arrow C Data Interface, avoiding any dependency on Arrow C++ libraries.

Current State

  • IRx lowers ASTx literals and expressions to LLVM-native scalar types (e.g., i32, float)
  • No columnar or structured memory model exists
  • No standardized data interchange format is supported

Tasks

  • Introduce an Arrow-aware IR lowering path in IRx
  • Lower ASTx values into Arrow C Data Interface–compatible structures.
  • Enable zero-copy interoperability with external Arrow consumers.
  • Preserve IRx’s existing LLVM backend (no breaking changes).

Expected Outcomes

  • New Arrow-based LLVM backend for IRx.
  • Ability to emit Arrow-compatible memory from ASTx programs
  • Zero-copy interoperability with Arrow-based systems.
  • Clear separation between scalar and columnar IR lowering.
  • Documentation and blog posts describing design decisions.

Details

  • Prerequisites:
    • Familiarity with LLVM-IR and the llvmlite library.
    • Proficiency in Python.
    • Familiarity with Apache Arrow fundamentals.
    • Memory layout and ABI concepts.
  • Duration: 350 hours
  • Complexity: Medium
  • Potential Mentor(s): Yuvi Mittal, Ivan Ogasawara

References

Clone this wiki locally