Gatun

⚠️ Alpha Status: This project is experimental and under active development. APIs may change without notice. Not recommended for production use.

High-performance Python-to-Java bridge using shared memory and Unix domain sockets.

Features

Shared Memory IPC: Zero-copy data transfer via mmap
FlatBuffers Protocol: Efficient binary serialization
Apache Arrow Integration: Zero-copy array/table transfer
Sync & Async Clients: Both blocking and asyncio support
Python Callbacks: Register Python functions as Java interfaces
Request Cancellation: Cancel long-running operations
JVM View API: Pythonic package-style navigation (client.jvm.java.util.ArrayList)
PySpark Integration: Use as backend for PySpark via BridgeAdapter
Pythonic JavaObjects: Iteration, indexing, and len() support on Java collections
Batch API: Execute multiple commands in a single round-trip (6x speedup for bulk ops)
Vectorized APIs: invoke_methods, create_objects, get_fields for 2-5x additional speedup
Observability: Server metrics, structured logging, and JFR events for debugging and monitoring

Performance

Gatun uses shared memory IPC which provides different trade-offs vs Py4J (PySpark's default TCP-based bridge):

Latency (Single Operations)

Gatun has 2-3x lower latency for individual operations:

Operation	Gatun	Py4J	Speedup
Method call (no args)	120 μs	350 μs	2.9x
Method call (with args)	140 μs	380 μs	2.7x
Object creation	150 μs	400 μs	2.7x
Static method	130 μs	360 μs	2.8x

Throughput (Bulk Operations)

For tight loops with pre-bound methods (where class/method resolution is cached), Py4J achieves higher ops/sec:

Operation	Gatun	Py4J	Notes
Bulk static calls (10K)	~45K ops/s	~60K ops/s	Pre-bound: `fn = Math.abs; fn(i)`
Bulk instance calls (10K)	~40K ops/s	~55K ops/s	Pre-bound: `fn = arr.add; fn(i)`
Mixed workload	~35K ops/s	~30K ops/s	Gatun faster for varied operations

Why the difference? Latency benchmarks measure full client.jvm.java.lang.Math.max(10, 20) calls including package navigation and method resolution (~120μs). Throughput benchmarks pre-bind methods first, measuring only the IPC cost (~22μs for Gatun). Py4J's TCP protocol has lower per-call IPC overhead than Gatun's shared memory protocol for small payloads.

Recommendation: Use vectorized APIs or Arrow for bulk data instead of tight loops.

Arrow Data Transfer

For bulk data, Arrow zero-copy transfer provides massive speedups over per-element transfer:

Data Size	IPC Format	Zero-Copy Buffers	Throughput
1K rows	800 μs	520 μs	54 MB/s
10K rows	890 μs	570 μs	509 MB/s
100K rows	1.4 ms	1.0 ms	1.5 GB/s
500K rows	5.9 ms	3.6 ms	2.1 GB/s

Vectorized APIs

Reduce round-trips with batch operations:

Operation	Individual Calls	Vectorized	Speedup
3 method calls	720 μs	490 μs	1.5x
10 method calls	1,600 μs	490 μs	3.3x
10 object creations	2,400 μs	1,100 μs	2.2x

When to Use Gatun vs Py4J

Use Case	Recommendation
Interactive/exploratory work	Gatun (lower latency)
Bulk data transfer	Gatun (Arrow support)
Simple tight loops	Py4J may be faster
Mixed operations	Gatun
PySpark integration	Either (Gatun via BridgeAdapter)

Benchmarks run on Apple M1, Java 22, Python 3.13. See docs/benchmarks.md for full methodology.

Installation

pip install gatun

Requirements

Python: 3.13+
Java: 22+
OS: Linux, macOS (Windows is not supported - Unix domain sockets required)

Quick Start

from gatun import connect

# Auto-launch server and connect
client = connect()

# Create Java objects via JVM view
ArrayList = client.jvm.java.util.ArrayList
my_list = ArrayList()
my_list.add("hello")
my_list.add("world")
print(my_list.size())  # 2

# Call static methods
result = client.jvm.java.lang.Integer.parseInt("42")  # 42
result = client.jvm.java.lang.Math.max(10, 20)        # 20

# Clean up
client.close()

Examples

java_import for Shorter Paths

from gatun import connect, java_import

client = connect()

# Wildcard import
java_import(client.jvm, "java.util.*")
arr = client.jvm.ArrayList()  # instead of client.jvm.java.util.ArrayList()
arr.add("hello")

# Single class import
java_import(client.jvm, "java.lang.StringBuilder")
sb = client.jvm.StringBuilder("hello")
print(sb.toString())  # "hello"

Collections

from gatun import connect, java_import

client = connect()

# HashMap
hm = client.jvm.java.util.HashMap()
hm.put("key1", "value1")
hm.put("key2", 42)
print(hm.get("key1"))  # "value1"
print(hm.size())       # 2

# TreeMap (sorted keys)
tm = client.jvm.java.util.TreeMap()
tm.put("zebra", 1)
tm.put("apple", 2)
tm.put("mango", 3)
print(tm.firstKey())  # "apple"
print(tm.lastKey())   # "zebra"

# HashSet (no duplicates)
hs = client.jvm.java.util.HashSet()
hs.add("a")
hs.add("b")
hs.add("a")  # duplicate ignored
print(hs.size())        # 2
print(hs.contains("a")) # True

# Collections utility methods
java_import(client.jvm, "java.util.*")
arr = client.jvm.ArrayList()
arr.add("banana")
arr.add("apple")
arr.add("cherry")
client.jvm.Collections.sort(arr)     # ["apple", "banana", "cherry"]
client.jvm.Collections.reverse(arr)  # ["cherry", "banana", "apple"]

# Arrays.asList (returns Python list)
result = client.jvm.java.util.Arrays.asList("a", "b", "c")  # ['a', 'b', 'c']

String Operations

from gatun import connect

client = connect()

# StringBuilder
sb = client.jvm.java.lang.StringBuilder("Hello")
sb.append(" ")
sb.append("World!")
print(sb.toString())  # "Hello World!"

# String static methods
result = client.jvm.java.lang.String.valueOf(123)  # "123"
result = client.jvm.java.lang.String.format("Hello %s, you have %d messages", "Alice", 5)
# "Hello Alice, you have 5 messages"

Math Operations

from gatun import connect

client = connect()

Math = client.jvm.java.lang.Math
print(Math.abs(-42))        # 42
print(Math.min(5, 3))       # 3
print(Math.max(10, 20))     # 20
print(Math.pow(2.0, 10.0))  # 1024.0 (note: use floats for double params)
print(Math.sqrt(16.0))      # 4.0

Integer Utilities

from gatun import connect

client = connect()

Integer = client.jvm.java.lang.Integer
print(Integer.parseInt("42"))        # 42
print(Integer.valueOf("123"))        # 123
print(Integer.toBinaryString(255))   # "11111111"
print(Integer.MAX_VALUE)             # 2147483647 (static field)

Passing Python Collections

Python lists and dicts are automatically converted to Java collections:

from gatun import connect

client = connect()

arr = client.jvm.java.util.ArrayList()
arr.add([1, 2, 3])                    # Converted to Java List
arr.add({"name": "Alice", "age": 30}) # Converted to Java Map
print(arr.size())  # 2

Async Client

from gatun import aconnect
import asyncio

async def main():
    client = await aconnect()

    # All operations are async
    arr = await client.jvm.java.util.ArrayList()
    await arr.add("hello")
    await arr.add("world")
    size = await arr.size()  # 2

    # Static methods
    result = await client.jvm.java.lang.Integer.parseInt("42")  # 42

    await client.close()

asyncio.run(main())

Python Callbacks

Register Python functions as Java interface implementations:

from gatun import connect

client = connect()

def compare(a, b):
    return -1 if a < b else (1 if a > b else 0)

comparator = client.register_callback(compare, "java.util.Comparator")

arr = client.jvm.java.util.ArrayList()
arr.add(3)
arr.add(1)
arr.add(2)
client.jvm.java.util.Collections.sort(arr, comparator)
# arr is now [1, 2, 3]

Async callbacks work too:

from gatun import aconnect
import asyncio

async def main():
    client = await aconnect()

    async def async_compare(a, b):
        await asyncio.sleep(0.01)  # Simulate async work
        return -1 if a < b else (1 if a > b else 0)

    comparator = await client.register_callback(async_compare, "java.util.Comparator")

asyncio.run(main())

Type Checking with is_instance_of

from gatun import connect

client = connect()

arr = client.create_object("java.util.ArrayList")
print(client.is_instance_of(arr, "java.util.List"))       # True
print(client.is_instance_of(arr, "java.util.Collection")) # True
print(client.is_instance_of(arr, "java.util.Map"))        # False

Pythonic Java Collections

JavaObject wrappers support iteration, indexing, and length:

from gatun import connect

client = connect()

arr = client.jvm.java.util.ArrayList()
arr.add("a")
arr.add("b")
arr.add("c")

# Iterate
for item in arr:
    print(item)  # "a", "b", "c"

# Index access
print(arr[0])  # "a"
print(arr[1])  # "b"

# Length
print(len(arr))  # 3

# Convert to Python list
items = list(arr)  # ["a", "b", "c"]

Batch API

Execute multiple commands in a single round-trip to reduce per-call overhead:

from gatun import connect

client = connect()

arr = client.create_object("java.util.ArrayList")

# Batch 100 operations in one round-trip (6x faster than individual calls)
with client.batch() as b:
    for i in range(100):
        b.call(arr, "add", i)
    size_result = b.call(arr, "size")

print(size_result.get())  # 100

# Mix different operation types
with client.batch() as b:
    obj = b.create("java.util.HashMap")
    r1 = b.call_static("java.lang.Integer", "parseInt", "42")
    r2 = b.call_static("java.lang.Math", "max", 10, 20)

print(r1.get())  # 42
print(r2.get())  # 20

# Error handling: continue on error (default) or stop on first error
with client.batch(stop_on_error=True) as b:
    r1 = b.call(arr, "add", "valid")
    r2 = b.call_static("java.lang.Integer", "parseInt", "invalid")  # Will error
    r3 = b.call(arr, "size")  # Skipped when stop_on_error=True

Vectorized APIs

For even faster bulk operations on the same target (2-5x speedup over batch):

from gatun import connect

client = connect()

# invoke_methods - Multiple calls on same object in one round-trip
arr = client.create_object("java.util.ArrayList")
results = client.invoke_methods(arr, [
    ("add", ("a",)),
    ("add", ("b",)),
    ("add", ("c",)),
    ("size", ()),
])
# results = [True, True, True, 3]

# create_objects - Create multiple objects in one round-trip
list1, map1, set1 = client.create_objects([
    ("java.util.ArrayList", ()),
    ("java.util.HashMap", ()),
    ("java.util.HashSet", ()),
])

# get_fields - Read multiple fields from one object
sb = client.create_object("java.lang.StringBuilder", "hello")
values = client.get_fields(sb, ["count"])  # [5]

When to use which API:

API	Best For
`invoke_methods`	Multiple method calls on same object
`create_objects`	Creating multiple objects at startup
`get_fields`	Reading multiple fields from one object
`batch`	Mixed operations on different objects

JavaArray for Primitive Arrays

Primitive arrays (int[], long[], double[], etc.) are returned as JavaArray:

from gatun import connect, JavaArray
import pyarrow as pa

client = connect()

# Primitive arrays from Java are JavaArray instances
original = pa.array([1, 2, 3], type=pa.int32())
int_array = client.jvm.java.util.Arrays.copyOf(original, 3)
print(isinstance(int_array, JavaArray))  # True
print(int_array.element_type)  # "Int"
print(list(int_array))  # [1, 2, 3]

# Create typed arrays manually for passing to Java
int_array = JavaArray([1, 2, 3], element_type="Int")
str_array = JavaArray(["a", "b"], element_type="String")
result = client.jvm.java.util.Arrays.toString(int_array)  # "[1, 2, 3]"

Object Arrays as JavaObject

Object arrays (Object[], String[]) are returned as JavaObject references:

from gatun import connect

client = connect()

# Object arrays from toArray() are JavaObject (not JavaArray)
arr = client.jvm.java.util.ArrayList()
arr.add("x")
arr.add("y")
java_array = arr.toArray()  # Returns JavaObject

# Use len() and iteration (not .size() or .length)
print(len(java_array))    # 2
print(java_array[0])      # "x"
print(list(java_array))   # ["x", "y"]

# Can still pass back to Java methods
result = client.jvm.java.util.Arrays.toString(java_array)  # "[x, y]"

This distinction exists because Object arrays are kept as references on the Java side, allowing Array.set() and Array.get() to modify them directly.

Arrow Data Transfer

Gatun supports multiple methods for transferring Arrow data between Python and Java:

from gatun import connect
import pyarrow as pa

client = connect()

# Method 1: IPC Format (simple, good for small data)
table = pa.table({"x": [1, 2, 3], "y": ["a", "b", "c"]})
result = client.send_arrow_table(table)  # "Received 3 rows"

# Method 2: Scoped Context Manager (recommended for most use cases)
# Handles arena lifecycle automatically with proper cleanup
with client.arrow_context() as ctx:
    ctx.send(table)              # Auto-resets arena between sends
    ctx.send(another_table)      # Safe to send multiple tables
    result = ctx.receive()       # Get data back as PyArrow table
# Arena automatically closed on exit

# Method 3: Zero-Copy Buffer Transfer (manual control for advanced use)
table = pa.table({"name": ["Alice", "Bob"], "age": [25, 30]})
arena = client.get_payload_arena()
schema_cache = {}
client.send_arrow_buffers(table, arena, schema_cache)

# Get data back from Java
result_view = client.get_arrow_data()
print(result_view.num_rows)  # 2
print(result_view.to_pydict())  # {'name': ['Alice', 'Bob'], 'age': [25, 30]}

arena.close()

Size Validation

Gatun validates data size before transfer and provides informative errors:

from gatun import connect, PayloadTooLargeError, estimate_arrow_size
import pyarrow as pa

client = connect(memory="16MB")
arena = client.get_payload_arena()

# Check size before sending
large_table = pa.table({"data": list(range(1_000_000))})
estimated_size = estimate_arrow_size(large_table)
print(f"Estimated size: {estimated_size:,} bytes")
print(f"Available: {arena.bytes_available():,} bytes")

# If too large, get a clear error
try:
    client.send_arrow_buffers(large_table, arena, {})
except PayloadTooLargeError as e:
    print(f"Table too large: {e.payload_size:,} > {e.max_size:,} bytes")
    print("Consider: reset arena, use batching, or increase memory")

arena.close()

Arrow Memory Architecture

Gatun's Arrow integration uses shared memory for high-performance data transfer with a carefully designed memory safety model.

Shared Memory Layout

┌─────────────────────────────────────────────────────────────────┐
│                     Shared Memory Region                         │
├─────────────────────────────────────────────────────────────────┤
│ Command Zone (64KB)    │ Python writes commands, Java reads     │
├─────────────────────────────────────────────────────────────────┤
│ Payload Zone           │ Arrow data buffers                      │
│   ├── First Half       │   Python → Java transfers               │
│   └── Second Half      │   Java → Python transfers               │
├─────────────────────────────────────────────────────────────────┤
│ Response Zone (64KB)   │ Java writes responses, Python reads    │
└─────────────────────────────────────────────────────────────────┘

The payload zone is split in half to enable bidirectional zero-copy transfer without data races:

Python → Java: Writes to first half [0, size/2)
Java → Python: Writes to second half [size/2, size)

Memory Safety: The Epoch System

Gatun uses an arena epoch system to prevent use-after-free and stale data access:

from gatun import connect, StaleArenaError
import pyarrow as pa

client = connect()
arena = client.get_payload_arena()

# Send data (epoch = 0)
table = pa.table({"id": [1, 2, 3]})
client.send_arrow_buffers(table, arena, {})

# Get data back - view is bound to current epoch
view = client.get_arrow_data()  # view._epoch = 0

# Reset arena - epoch increments to 1
arena.reset()
client.reset_payload_arena()

# Accessing stale view raises StaleArenaError
try:
    data = view.to_pydict()  # Raises StaleArenaError!
except StaleArenaError as e:
    print(f"View epoch {e.view_epoch} != current epoch {e.current_epoch}")

arena.close()

How epochs work:

Initial state: Both Python and Java start with epoch 0
On data transfer: The ArrowBatchDescriptor includes the current epoch
On validation: Java rejects data if descriptor epoch doesn't match its epoch
On reset: Both sides increment their epoch, invalidating all previous views
On access: ArrowTableView checks epoch before returning data

This prevents:

Use-after-reset: Accessing data after arena memory is reused
Stale reads: Reading outdated data from a previous transfer
Cross-session corruption: Data from one transfer corrupting another

Data Flow: Python → Java

1. Python: Copy Arrow buffers to shared memory (first half)
   ┌──────────────┐     memcpy      ┌──────────────────────┐
   │ PyArrow Table│ ───────────────>│ Shared Memory [0,N/2)│
   └──────────────┘                 └──────────────────────┘

2. Python: Send ArrowBatchDescriptor via FlatBuffers
   - Buffer offsets and lengths
   - Schema (or schema hash if cached)
   - Current epoch

3. Java: Validate epoch, wrap buffers as ArrowBuf (zero-copy read)
   ┌──────────────────────┐  wrap   ┌──────────────────────┐
   │ Shared Memory [0,N/2)│ ───────>│ VectorSchemaRoot     │
   └──────────────────────┘         └──────────────────────┘

Data Flow: Java → Python

1. Python: Request data via GetArrowData command

2. Java: Write Arrow buffers to shared memory (second half)
   ┌──────────────────────┐  memcpy ┌────────────────────────┐
   │ VectorSchemaRoot     │ ───────>│ Shared Memory [N/2, N) │
   └──────────────────────┘         └────────────────────────┘

3. Java: Send ArrowBatchDescriptor with buffer offsets + epoch

4. Python: Wrap buffers as PyArrow arrays (zero-copy read)
   ┌────────────────────────┐  wrap  ┌──────────────┐
   │ Shared Memory [N/2, N) │ ──────>│ ArrowTableView│
   └────────────────────────┘        └──────────────┘

Best Practices

Recommended: Use the Context Manager

from gatun import connect
import pyarrow as pa

client = connect(memory="256MB")

# Simple case: send and receive with automatic cleanup
with client.arrow_context() as ctx:
    ctx.send(my_table)
    result = ctx.receive()
    # Process result...
# Arena automatically cleaned up, even on exceptions

For Batch Processing

from gatun import connect, estimate_arrow_size
import pyarrow as pa

client = connect(memory="256MB")

# Process large dataset in batches
with client.arrow_context() as ctx:
    for batch in large_table.to_batches(max_chunksize=100_000):
        ctx.send(batch)  # Auto-resets arena between sends
        # Process batch in Java...

# Async version works the same way
async with client.arrow_context() as ctx:
    await ctx.send(table)

Manual Control (Advanced)

from gatun import connect
import pyarrow as pa

client = connect(memory="256MB")

# For fine-grained control over arena lifecycle
arena = client.get_payload_arena()
schema_cache = {}  # Reuse cache for schema deduplication

for batch in large_table.to_batches(max_chunksize=100_000):
    arena.reset()
    client.reset_payload_arena()
    client.send_arrow_buffers(batch, arena, schema_cache)
    # Process batch in Java...

# Always close arena when done
arena.close()

Guidelines:

Use arrow_context() for most use cases - handles cleanup automatically
Use send_arrow_table() for small data (< 1K rows) - simpler API
Use send_arrow_buffers() with manual arena for maximum control
Use estimate_arrow_size() to check size before large transfers
Keep schema_cache across transfers to avoid re-serializing schema

Low-Level API

For direct control:

from gatun import connect

client = connect()

# Create objects
obj = client.create_object("java.util.ArrayList")
obj = client.create_object("java.util.ArrayList", 100)  # with capacity

# Invoke methods
client.invoke_method(obj.object_id, "add", "item")
result = client.invoke_static_method("java.lang.Math", "max", 10, 20)

# Access static fields
max_int = client.get_field(client.jvm.java.lang.Integer, "MAX_VALUE")

# Vectorized operations (single round-trip for multiple operations)
client.invoke_methods(obj, [("add", ("a",)), ("add", ("b",)), ("size", ())])
client.create_objects([("java.util.ArrayList", ()), ("java.util.HashMap", ())])

Observability

Get server metrics for debugging and monitoring:

from gatun import connect

client = connect()

# Get server metrics report
metrics = client.get_metrics()
print(metrics)
# === Gatun Server Metrics ===
# Global:
#   total_requests: 150
#   total_errors: 0
#   requests_per_sec: 45.23
#   current_sessions: 1
#   current_objects: 12
#   peak_objects: 25
# ...

Enable trace mode for method resolution debugging:

from gatun import connect

# Enable trace mode
client = connect(trace=True)

# Enable verbose logging
client = connect(log_level="FINE")

Or via environment variables:

export GATUN_TRACE=true
export GATUN_LOG_LEVEL=FINE

PySpark Integration

Use Gatun as the JVM communication backend for PySpark:

# Enable Gatun backend
export PYSPARK_USE_GATUN=true
export GATUN_MEMORY=256MB

# Run PySpark normally
python my_spark_app.py

Or use the BridgeAdapter API directly:

from gatun.bridge_adapters import GatunAdapter

# Create bridge (launches JVM)
bridge = GatunAdapter(memory="256MB")

# Use bridge API
obj = bridge.new("java.util.ArrayList")
bridge.call(obj, "add", "hello")
result = bridge.call_static("java.lang.Math", "max", 10, 20)

# Array operations
arr = bridge.new_array("java.lang.String", 3)
bridge.array_set(arr, 0, "hello")
bridge.array_get(arr, 0)  # "hello"

bridge.close()

Configuration

Configure via pyproject.toml:

[tool.gatun]
memory = "64MB"
socket_path = "/tmp/gatun.sock"  # Optional: uses random path by default

Or environment variables:

export GATUN_MEMORY=64MB
export GATUN_SOCKET_PATH=/tmp/gatun.sock

Supported Types

Python	Java
`int`	`int`, `long`
`float`	`double`
`bool`	`boolean`
`str`	`String`
`list`	`List` (ArrayList)
`dict`	`Map` (HashMap)
`bytes`	`byte[]`
`JavaArray`	Primitive arrays (`int[]`, `double[]`, etc.)
`pyarrow.Array`	Typed arrays
`None`	`null`
`JavaObject`	Object reference (including Object arrays)

Exception Handling

Java exceptions are mapped to Python exceptions:

from gatun import (
    connect,
    JavaException,
    JavaSecurityException,
    JavaIllegalArgumentException,
    JavaNoSuchMethodException,
    JavaClassNotFoundException,
    JavaNullPointerException,
    JavaIndexOutOfBoundsException,
    JavaNumberFormatException,
)

client = connect()

try:
    client.jvm.java.lang.Integer.parseInt("not_a_number")
except JavaNumberFormatException as e:
    print(f"Parse error: {e}")

Architecture

Gatun uses a client-server architecture with shared memory for high-performance IPC:

┌───────────────────────────────────────────────────────────────┐
│                        Python Client                          │
│  ┌─────────────┐  ┌─────────────┐  ┌───────────────────────┐  │
│  │ GatunClient │  │ AsyncClient │  │    BridgeAdapter      │  │
│  └──────┬──────┘  └──────┬──────┘  └───────────┬───────────┘  │
│         └────────────────┼─────────────────────┘              │
│                          ▼                                    │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │              FlatBuffers Serialization                  │  │
│  └─────────────────────────────────────────────────────────┘  │
└──────────────────────────┬────────────────────────────────────┘
                           │ Unix Domain Socket (length prefix)
                           │ + Shared Memory (command/response)
┌──────────────────────────▼────────────────────────────────────┐
│                         Java Server                           │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                     GatunServer                         │  │
│  │  - Command dispatch (create, invoke, field access)      │  │
│  │  - Object registry and session management               │  │
│  │  - Security allowlist enforcement                       │  │
│  └─────────────────────────────────────────────────────────┘  │
│  ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐  │
│  │ ReflectionCache │ │ MethodResolver  │ │ ArrowHandler    │  │
│  │ - Method cache  │ │ - Overload res. │ │ - Arrow IPC     │  │
│  │ - Constructor   │ │ - Varargs       │ │ - Zero-copy     │  │
│  │ - Field cache   │ │ - Type compat.  │ │                 │  │
│  └─────────────────┘ └─────────────────┘ └─────────────────┘  │
└───────────────────────────────────────────────────────────────┘

Communication Flow

Python serializes command to FlatBuffers, writes to shared memory
Length prefix sent over Unix socket signals Java to process
Java reads command from shared memory, executes, writes response
Response length sent back over socket
Python reads response from shared memory

Memory Layout

Offset 0          64KB                            size-64KB        size
   │               │                                  │              │
   ▼               ▼                                  ▼              ▼
   ┌───────────────┬──────────────────────────────────┬──────────────┐
   │ Command Zone  │         Payload Zone             │Response Zone │
   │ (Python→Java) │  [First half]  │  [Second half]  │ (Java→Python)│
   └───────────────┴───────────────┴──────────────────┴──────────────┘
                    Python→Java     Java→Python
                    Arrow data      Arrow data

See Arrow Memory Architecture for details on the epoch-based memory safety model.

Development

cd python
JAVA_HOME=/opt/homebrew/opt/openjdk uv sync  # Install deps and build JAR
uv run pytest              # Run tests
uv run ruff check .        # Lint
uv run ruff format .       # Format

The uv sync command automatically builds the Java JAR via the custom build backend.

Project Structure

gatun/
├── python/
│   └── src/gatun/         # Python client library
│       ├── client.py      # Sync client
│       ├── async_client.py# Async client
│       ├── launcher.py    # Server process management
│       └── bridge.py      # BridgeAdapter interface
├── gatun-core/
│   └── src/main/java/org/gatun/server/
│       ├── GatunServer.java       # Main server
│       ├── ReflectionCache.java   # Caching layer
│       ├── MethodResolver.java    # Method resolution
│       └── ArrowMemoryHandler.java# Arrow integration
└── schemas/
    └── commands.fbs       # FlatBuffers protocol schema

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.claude		.claude
.github/workflows		.github/workflows
assets		assets
benchmarks		benchmarks
docs		docs
gatun-core		gatun-core
gradle/wrapper		gradle/wrapper
schemas		schemas
scripts		scripts
src/gatun		src/gatun
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build.gradle.kts		build.gradle.kts
gatun_build_backend.py		gatun_build_backend.py
gradlew		gradlew
gradlew.bat		gradlew.bat
pyproject.toml		pyproject.toml
settings.gradle.kts		settings.gradle.kts
uv.lock		uv.lock

License

forge-labs-dev/gatun

Folders and files

Latest commit

History

Repository files navigation

Gatun

Features

Performance

Latency (Single Operations)

Throughput (Bulk Operations)

Arrow Data Transfer

Vectorized APIs

When to Use Gatun vs Py4J

Installation

Requirements

Quick Start

Examples

java_import for Shorter Paths

Collections

String Operations

Math Operations

Integer Utilities

Passing Python Collections

Async Client

Python Callbacks

Type Checking with is_instance_of

Pythonic Java Collections

Batch API

Vectorized APIs

JavaArray for Primitive Arrays

Object Arrays as JavaObject

Arrow Data Transfer

Size Validation

Arrow Memory Architecture

Shared Memory Layout

Memory Safety: The Epoch System

Data Flow: Python → Java

Data Flow: Java → Python

Best Practices

Recommended: Use the Context Manager

For Batch Processing

Manual Control (Advanced)

Low-Level API

Observability

PySpark Integration

Configuration

Supported Types

Exception Handling

Architecture

Communication Flow

Memory Layout

Development

Project Structure

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages