<<<<<<< HEAD
Original Paper https://arxiv.org/abs/2512.24601, and repo https://github.com/alexzhang13/rlm
- Go Orchestrator: Replaces the Python core, managing the iterative RLM loop, LLM interactions, and REPL orchestration.
- Persistent Python REPL: A Go-managed Python process that maintains state between code blocks within a single request. It includes an embedded HTTP server to handle recursive llm_query and llm_query_batched calls from the Python environment.
- Gemini Integration: Uses the official google.golang.org/genai SDK for high-performance interaction with Gemini models.
- Cloud Run Optimized: Structured as a stateless HTTP service that adheres to Cloud Run's ephemeral execution model.
-
Set Environment Variables:
- GEMINI_API_KEY: Your Google AI Studio or Vertex AI API key.
- PORT: Port for the service (defaults to 8080).
-
Build and Push to Artifact Registry:
cd rlm-go
gcloud builds submit --tag gcr.io/[PROJECT_ID]/rlm-go- Deploy to Cloud Run:
gcloud run deploy rlm-go \
--image gcr.io/[PROJECT_ID]/rlm-go \
--platform managed \
--set-env-vars GEMINI_API_KEY=your_key_here \
--allow-unauthenticatedSend a POST request to the /completion endpoint:
curl -X POST https://[YOUR_CLOUD_RUN_URL]/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "Calculate the first 50 prime numbers and check if their sum is prime.",
"context": {"source": "arithmetic_task"}
}'
The service will autonomously execute Python code to solve the task, recursively calling Gemini if needed to analyze results, and return the final answer.
=======
RLM-Go is a high-performance implementation of the Recursive Language Model (RLM) paradigm in Go. It enables LLMs to process arbitrarily long contexts and perform complex, multi-step reasoning by interacting with a persistent Python REPL environment.
This implementation is based on the research paper: Recursive Language Models.
-
Symbolic Handle Architecture: Context is kept in an external REPL environment. The LLM only receives metadata and a "symbolic handle" (the
contextvariable) to manipulate it, effectively bypassing context window limits. -
Recursive Inference: The LLM can programmatically invoke sub-RLM calls from within Python code (via
llm_query), enabling$\Omega(N)$ or$\Omega(N^2)$ semantic work. -
Security by Design:
- Non-root execution in Docker.
- Stateless architecture (REPL state is per-request).
- Input validation and JSON-only error responses.
-
Observability:
- Structured JSON logging with
slog. - Prometheus metrics for latency, iteration counts, token usage, and errors.
- Ready-to-use Grafana dashboard (
dashboard.json).
- Structured JSON logging with
- Idiomatic Go: Refactored for clean separation of concerns, unit testing, and E2E verification.
- Orchestrator: Manages the iterative loop between the LLM and the REPL.
- REPL: A long-lived Python process (per request) that executes code and maintains state.
- LM Handler: An internal HTTP bridge that allows the Python environment to call back to the LLM.
- Client: Integration with Gemini via the Google Generative AI SDK.
- Go 1.24+
- Python 3.x
- Gemini API Key
Set the following environment variables:
GEMINI_API_KEY: Your API key.GEMINI_MODEL_NAME: (Optional) Default isgemini-2.5-flash.PORT: (Optional) Default is8080.
go run cmd/server/main.godocker build -t rlm-go .
docker run -p 8080:8080 -e GEMINI_API_KEY=your_key rlm-goGenerate a recursive completion.
curl -X POST http://localhost:8080/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "Summarize the key events in this data and provide a final answer.",
"context": "Very long data string or object..."
}'Parameters:
prompt: (String) Your query or instructions.context: (Any) The data to be injected into the REPL'scontextvariable.max_iterations: (Integer, optional) Maximum number of RLM steps.
Exposes Prometheus metrics.
curl http://localhost:8080/metricsA Grafana dashboard is available in dashboard.json. It tracks:
- HTTP Request Rate: Success vs. failure rates.
- P95 Latency: Completion duration.
- RLM Iterations: Distribution of steps taken to reach an answer.
- Token Usage: Input/Output token counts per model.
- Detailed Specification: SLIs/SLOs, security, and technical details.
- API Reference: OpenAPI 3.0 specification.
- RLM Paper Reference: Research background.
Run all tests:
go test ./...E2E tests require a valid GEMINI_API_KEY.
fa9bdb4 (Refactor RLM-Go: Comprehensive overhaul for security, observability, and paper-based recursive logic)