RLM Implementation

<<<<<<< HEAD

RLM Implementation

Original Paper https://arxiv.org/abs/2512.24601, and repo https://github.com/alexzhang13/rlm

Implementation Summary

Go Orchestrator: Replaces the Python core, managing the iterative RLM loop, LLM interactions, and REPL orchestration.
Persistent Python REPL: A Go-managed Python process that maintains state between code blocks within a single request. It includes an embedded HTTP server to handle recursive llm_query and llm_query_batched calls from the Python environment.
Gemini Integration: Uses the official google.golang.org/genai SDK for high-performance interaction with Gemini models.
Cloud Run Optimized: Structured as a stateless HTTP service that adheres to Cloud Run's ephemeral execution model.

How to Deploy

Set Environment Variables:
- GEMINI_API_KEY: Your Google AI Studio or Vertex AI API key.
- PORT: Port for the service (defaults to 8080).
Build and Push to Artifact Registry:

cd rlm-go
gcloud builds submit --tag gcr.io/[PROJECT_ID]/rlm-go

Deploy to Cloud Run:

gcloud run deploy rlm-go \
--image gcr.io/[PROJECT_ID]/rlm-go \
--platform managed \
--set-env-vars GEMINI_API_KEY=your_key_here \
--allow-unauthenticated

How to Use

Send a POST request to the /completion endpoint:

curl -X POST https://[YOUR_CLOUD_RUN_URL]/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "Calculate the first 50 prime numbers and check if their sum is prime.",
"context": {"source": "arithmetic_task"}
}'

The service will autonomously execute Python code to solve the task, recursively calling Gemini if needed to analyze results, and return the final answer.

=======

RLM-Go: Recursive Language Model Implementation

RLM-Go is a high-performance implementation of the Recursive Language Model (RLM) paradigm in Go. It enables LLMs to process arbitrarily long contexts and perform complex, multi-step reasoning by interacting with a persistent Python REPL environment.

This implementation is based on the research paper: Recursive Language Models.

Key Features

Symbolic Handle Architecture: Context is kept in an external REPL environment. The LLM only receives metadata and a "symbolic handle" (the context variable) to manipulate it, effectively bypassing context window limits.
Recursive Inference: The LLM can programmatically invoke sub-RLM calls from within Python code (via llm_query), enabling $\Omega(N)$ or $\Omega(N^2)$ semantic work.
Security by Design:
- Non-root execution in Docker.
- Stateless architecture (REPL state is per-request).
- Input validation and JSON-only error responses.
Observability:
- Structured JSON logging with slog.
- Prometheus metrics for latency, iteration counts, token usage, and errors.
- Ready-to-use Grafana dashboard (dashboard.json).
Idiomatic Go: Refactored for clean separation of concerns, unit testing, and E2E verification.

Architecture

Orchestrator: Manages the iterative loop between the LLM and the REPL.
REPL: A long-lived Python process (per request) that executes code and maintains state.
LM Handler: An internal HTTP bridge that allows the Python environment to call back to the LLM.
Client: Integration with Gemini via the Google Generative AI SDK.

Getting Started

Prerequisites

Go 1.24+
Python 3.x
Gemini API Key

Configuration

Set the following environment variables:

GEMINI_API_KEY: Your API key.
GEMINI_MODEL_NAME: (Optional) Default is gemini-2.5-flash.
PORT: (Optional) Default is 8080.

Running Locally

go run cmd/server/main.go

Running with Docker

docker build -t rlm-go .
docker run -p 8080:8080 -e GEMINI_API_KEY=your_key rlm-go

API Usage

POST `/completion`

Generate a recursive completion.

curl -X POST http://localhost:8080/completion \
-H "Content-Type: application/json" \
-d '{
  "prompt": "Summarize the key events in this data and provide a final answer.",
  "context": "Very long data string or object..."
}'

Parameters:

prompt: (String) Your query or instructions.
context: (Any) The data to be injected into the REPL's context variable.
max_iterations: (Integer, optional) Maximum number of RLM steps.

GET `/metrics`

Exposes Prometheus metrics.

curl http://localhost:8080/metrics

Monitoring

A Grafana dashboard is available in dashboard.json. It tracks:

HTTP Request Rate: Success vs. failure rates.
P95 Latency: Completion duration.
RLM Iterations: Distribution of steps taken to reach an answer.
Token Usage: Input/Output token counts per model.

Documentation

Detailed Specification: SLIs/SLOs, security, and technical details.
API Reference: OpenAPI 3.0 specification.
RLM Paper Reference: Research background.

Testing

Run all tests:

go test ./...

E2E tests require a valid GEMINI_API_KEY.

fa9bdb4 (Refactor RLM-Go: Comprehensive overhaul for security, observability, and paper-based recursive logic)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
cmd/server		cmd/server
docs		docs
internal		internal
test		test
Dockerfile		Dockerfile
README.md		README.md
dashboard.json		dashboard.json
go.mod		go.mod
go.sum		go.sum
rlm-server		rlm-server
workload-id.sh		workload-id.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RLM Implementation

Original Paper https://arxiv.org/abs/2512.24601, and repo https://github.com/alexzhang13/rlm

Implementation Summary

How to Deploy

How to Use

The service will autonomously execute Python code to solve the task, recursively calling Gemini if needed to analyze results, and return the final answer.

RLM-Go: Recursive Language Model Implementation

Key Features

Architecture

Getting Started

Prerequisites

Configuration

Running Locally

Running with Docker

API Usage

POST `/completion`

GET `/metrics`

Monitoring

Documentation

Testing

About

Uh oh!

Releases

Packages

Languages

iuriikogan/rlm-go

Folders and files

Latest commit

History

Repository files navigation

RLM Implementation

Original Paper https://arxiv.org/abs/2512.24601, and repo https://github.com/alexzhang13/rlm

Implementation Summary

How to Deploy

How to Use

The service will autonomously execute Python code to solve the task, recursively calling Gemini if needed to analyze results, and return the final answer.

RLM-Go: Recursive Language Model Implementation

Key Features

Architecture

Getting Started

Prerequisites

Configuration

Running Locally

Running with Docker

API Usage

POST /completion

GET /metrics

Monitoring

Documentation

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

POST `/completion`

GET `/metrics`

Packages