An implementation of Apache Arrow in Mojo. The initial motivation was to learn Mojo while doing something useful, and since I've been involved in Apache Arrow for a while it seemed a natural fit. The project has grown beyond a prototype: it now has a full Python binding layer, SIMD compute kernels, GPU acceleration, and benchmarks showing it outperforms PyArrow on array construction for common numeric and string workloads.
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized, language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs.
Mojo is a new programming language built on MLIR that combines Python expressiveness with the performance of systems programming languages.
Arrow should be a first-class citizen in Mojo's ecosystem. This implementation provides zero-copy interoperability with PyArrow via the Arrow C Data Interface, and serves as a foundation for high-performance data processing in Mojo.
Array types
PrimitiveArray[T]— numeric and boolean arrays with type aliases:BoolArray,Int8Array…Int64Array,UInt8Array…UInt64Array,Float32Array,Float64ArrayStringArray— UTF-8 variable-length stringsListArray— variable-length nested arraysFixedSizeListArray— fixed-size nested arrays (embedding vectors, coordinates)StructArray— named-field structsChunkedArray— array split across multiple chunksAnyArray— type-erased immutable array container (O(1) copy viaArcPointer)RecordBatch— schema + column arrays, with slice, select, rename, add/remove/set column operationsTable— schema + chunked columns;from_batches(),to_batches(),combine_chunks()
Scalar types
PrimitiveScalar[T],StringScalar,ListScalar,StructScalar— typed scalars holding native valuesAnyScalar— type-erased scalar backed by a length-1AnyArray
Builders — incrementally build immutable arrays
PrimitiveBuilder[T],StringBuilder,ListBuilder,FixedSizeListBuilder,StructBuilderAnyBuilder— type-erased builder using function-pointer vtable dispatch (O(1) copy viaArcPointer)
Compute kernels (SIMD-vectorized, null-aware)
- Arithmetic:
add,sub,mul,div,floordiv,mod,neg,abs_,min_,max_ - Math (unary):
sign,sqrt,exp,exp2,log,log2,log10,log1p,floor,ceil,trunc,round,sin,cos - Math (binary):
pow_ - Comparisons:
equal,not_equal,less,less_equal,greater,greater_equal→BoolArray(CPU + GPU) - Aggregates:
sum_,product,min_,max_,any_,all_(null-skipping) - Group-by:
groupby(keys, values, aggregations)— fused hash+aggregate, returnsRecordBatch - Hashing:
hash_for primitive, string, and struct arrays - Selection:
filter_,drop_nulls - Strings:
string_lengths - Similarity:
cosine_similarity(batch N-vectors vs 1 query, CPU SIMD + GPU)
Expression execution (marrow/expr)
- Build lazy expression trees with
col(),lit(),if_else()and operator overloads (+,-,*,/,>,<,==,&,|, …) - Relational plan nodes:
InMemoryTable,Filter,Project,ParquetScan,Aggregatewith.filter(),.select(),.aggregate()chaining - Pull-based streaming executor:
Plannercompiles a plan into typed processor trees;execute()collectsRecordBatchresults
Parquet I/O (marrow/parquet)
read_table(path)— read a Parquet file into a marrowTablewrite_table(table, path)— write a marrowTableto Parquet
Python bindings — import marrow as ma
array(values, type=None)— create any array type from Python lists with type inference- All compute kernels exposed as free functions
- Full null handling, type coercion, nested structure support
Interoperability
- Arrow C Data Interface — zero-copy exchange with PyArrow
- GPU acceleration via Mojo's
DeviceContext(Metal on Apple Silicon, CUDA on NVIDIA)
pixi run build_python # compile marrow.soimport marrow as ma
# ── Array construction ────────────────────────────────────────────────────────
# Primitive arrays — type inference
a = ma.array([1, 2, 3, None, 5]) # int64 with one null
f = ma.array([1.0, 2.5, None, 4.0]) # float64
# Explicit types
a = ma.array([1, 2, 3, None, 5], type=ma.int64())
# Strings
s = ma.array(["hello", None, "world"])
# Nested lists
nested = ma.array([[1, 2], [3, 4, 5], None])
# Struct arrays — automatic type inference from dict keys
structs = ma.array([{"x": 1, "y": 1.5}, {"x": 2, "y": 2.5}])
# With explicit schema
t = ma.struct([ma.field("x", ma.int64()), ma.field("y", ma.float64())])
structs = ma.array([{"x": 1, "y": 1.5}, {"x": 2, "y": 2.5}], type=t)
# ── Arithmetic (null-propagating) ─────────────────────────────────────────────
b = ma.array([10, 20, 30, None, 50])
result = ma.add(a, b) # null where either input is null
result = ma.sub(a, b)
result = ma.mul(a, b)
result = ma.div(a, b)
# ── Aggregates (null-skipping) ────────────────────────────────────────────────
ma.sum_(a) # → 11.0 (skips the null at index 3)
ma.product(a) # → 30.0
ma.min_(a) # → 1.0
ma.max_(a) # → 5.0
ma.any_(ma.array([False, True, None])) # → True
ma.all_(ma.array([True, True, None])) # → True
# ── Selection ─────────────────────────────────────────────────────────────────
mask = ma.array([True, False, True, False, True])
ma.filter_(a, mask) # [1, 3, 5]
ma.drop_nulls(a) # [1, 2, 3, 5] (removes index 3)
# ── Array methods ─────────────────────────────────────────────────────────────
len(a) # 5
a.null_count() # 1
a.type() # int64
a.slice(1, 3) # [2, 3, None] — zero-copy
a[0] # 1
str(a) # "Int64Array([1, 2, 3, NULL, 5])"
# Struct field access
structs.field(0) # Int64Array — field "x"
structs.field("y") # Float64Array — field "y"from marrow.arrays import array, PrimitiveArray, StringArray, BoolArray
from marrow.dtypes import int8, int32, int64, bool_, list_
# Factory function — list of optionals
var a = array[int32]([1, 2, 3, 4, 5])
var b = array[int64]([1, None, 3, None, 5]) # nulls at index 1 and 3
var c = array[bool_]([True, False, True])from marrow.builders import PrimitiveBuilder, StringBuilder, ListBuilder
# Primitive
var pb = PrimitiveBuilder[int64](capacity=4)
pb.append(10)
pb.append(20)
pb.append_null()
pb.append(40)
var arr: Int64Array = pb.finish()
# String
var sb = StringBuilder()
sb.append("hello")
sb.append_null()
sb.append("world")
var strs: StringArray = sb.finish()
# List of int32 — append child elements, then commit each list element
var child = PrimitiveBuilder[int32]()
child.append(1)
child.append(2)
var lb = ListBuilder(child^) # moves child into the builder
lb.append(True) # [1, 2] is the first list element
lb.values().append(3) # child element for the next list
lb.append(True) # [3] is the second list element
lb.append_null() # null third element
var lists: ListArray = lb.finish()All arrays implement Writable so they print directly:
print(arr) # Int64Array([10, 20, NULL, 40])
print(strs) # StringArray([hello, NULL, world])from marrow.kernels.arithmetic import add, sub, mul, div, sqrt, log, sin
from marrow.kernels.aggregate import sum_, min_, max_, any_, all_
from marrow.kernels.filter import filter_, drop_nulls
from marrow.kernels.compare import equal, less, greater_equal
from marrow.kernels.groupby import groupby
var x = array[int64]([1, 2, 3, 4])
var y = array[int64]([10, 20, 30, 40])
var z = add(x, y) # Int64Array([11, 22, 33, 44])
var total = sum_[int64](x) # 10
var filtered = filter_[int64](x, array[bool_]([True, False, True, False]))
var a = array[int64]([1, 2, 3, 4])
var b = array[int64]([1, 3, 2, 4])
var eq = equal(a, b) # BoolArray([true, false, false, true])
var lt = less(a, b) # BoolArray([false, true, false, false])
# Unary math (floating-point)
var f = array[float64]([1.0, 4.0, 9.0, 16.0])
var s = sqrt(f) # Float64Array([1.0, 2.0, 3.0, 4.0])
var l = log(f) # natural log
# Group-by
var keys = array[int64]([1, 2, 1, 2, 1])
var vals = array[float64]([10.0, 20.0, 30.0, 40.0, 50.0])
var result = groupby(keys, [vals], ["sum"]) # RecordBatch: key=[1,2], sum=[90.0, 60.0]from marrow.expr import col, lit, in_memory_table, execute, ExecutionContext
from marrow.tabular import record_batch
var batch = record_batch(
[array[int64]([25, 35, 45]), array[String](["Alice", "Bob", "Carol"])],
names=["age", "name"],
)
var plan = in_memory_table(batch)
.filter(col("age") > lit(30))
.select(col("name"), col("age"))
var ctx = ExecutionContext()
var results = execute(plan, ctx) # List[RecordBatch]from marrow.parquet import read_table, write_table
var tbl = read_table("data.parquet")
write_table(tbl, "output.parquet")from std.python import Python
from marrow.c_data import CArrowArray, CArrowSchema
var pa = Python.import_module("pyarrow")
var pyarr = pa.array([1, 2, 3, 4, 5], mask=[False, False, False, False, True])
var capsules = pyarr.__arrow_c_array__()
var dtype = CArrowSchema.from_pycapsule(capsules[0]).to_dtype() # int64
var data = CArrowArray.from_pycapsule(capsules[1])^.to_array(dtype)
var typed = data.as_int64()
print(typed.is_valid(0)) # True
print(typed.is_valid(4)) # False (null)
print(typed.unsafe_get(0)) # 1Python array construction vs PyArrow (n=100,000 elements, Apple M-series, mean time):
| Array type | marrow | PyArrow | speedup |
|---|---|---|---|
| int64 (explicit type) | 0.30 ms | 0.92 ms | 3.0x faster |
| int64 + nulls (explicit) | 0.30 ms | 0.91 ms | 3.0x faster |
| float64 (explicit) | 0.28 ms | 0.48 ms | 1.7x faster |
| float64 + nulls | 0.28 ms | 0.52 ms | 1.8x faster |
| string (explicit) | 0.81 ms | 1.07 ms | 1.3x faster |
| string + nulls | 0.80 ms | 1.04 ms | 1.3x faster |
| struct, primitive fields | 4.64 ms | 6.35 ms | 1.4x faster |
| int64 (inferred) | 1.58 ms | 1.28 ms | 1.2x slower |
| string (inferred) | 0.92 ms | 1.01 ms | ~parity |
| nested list (inferred) | 0.61 ms | 2.37 ms | 3.9x faster |
When the array type is provided explicitly, marrow's builder path is faster than PyArrow's for numeric and string types. Type inference involves a Python-side scan to detect the type, which adds overhead; this gap will narrow as the inference path is optimized.
Run the benchmarks yourself:
pixi run bench_python # Python array construction vs PyArrow
pixi run bench # CPU SIMD arithmetic benchmarks
pixi run bench_similarity # cosine similarity: CPU vs GPUGPU kernels are available for compute-intensive operations when a DeviceContext is provided. Benchmarked on Apple Silicon (M-series, Metal, unified memory):
Cosine similarity (batch N-vectors vs 1 query, dim=768):
| Vectors | CPU SIMD | GPU (upload per call) | GPU (pre-loaded) |
|---|---|---|---|
| 10 K | baseline | 2–3x slower | ~1x (crossover) |
| 100 K | baseline | ~1x | ~3x faster |
| 500 K | baseline | — | ~13x faster |
The key pattern: upload data to the GPU once, run multiple kernels, download results at the end. The crossover vs CPU SIMD is around 10K vectors at dim≥384.
Element-wise arithmetic (add, mul, etc.) is faster on CPU SIMD — data transfer overhead dominates for low arithmetic-intensity operations.
from std.gpu.host import DeviceContext
from marrow.kernels.similarity import cosine_similarity
# Pre-load data onto the GPU once
var ctx = DeviceContext()
var vectors_gpu = vectors.to_device(ctx)
var query_gpu = query.to_device(ctx)
# Run many similarity searches without re-uploading
var scores = cosine_similarity(vectors_gpu, query_gpu, ctx)-
C Data Interface: Release callbacks are not invoked (Mojo cannot pass a callback to a C function yet). Consuming Arrow data from PyArrow works; producing data back to PyArrow via the release mechanism is not fully implemented.
-
Testing: Conformance against the Arrow specification is verified through PyArrow since Mojo has no JSON library yet. Full integration testing requires a Mojo JSON reader.
-
Type coverage: Only boolean, numeric, string, list, fixed-size list, and struct types are implemented. Date/time, dictionary, union, decimal, and binary types are not yet supported.
-
Parquet I/O: Parquet support currently bridges through PyArrow. Native Mojo Parquet reading is planned for a future release.
-
GPU null handling: Binary arithmetic kernels on the GPU do not propagate null bitmaps (GPU
bitmap_andis not yet implemented). Null-aware GPU arithmetic is CPU-only for now.
Install pixi, then:
pixi run test # run all tests (Mojo + Python)
pixi run test_mojo # Mojo unit tests only
pixi run test_python # Python binding tests only
pixi run bench # CPU/GPU arithmetic benchmarks
pixi run bench_python # Python vs PyArrow array construction benchmarks
pixi run bench_similarity # cosine similarity: CPU vs GPU vs GPU preloaded
pixi run fmt # format all code (Mojo + Python)If the project matures, the goal is to contribute it upstream to the Apache Arrow project.
