Skip to content

Implementation of Recursive Language Model paper from scratch

Notifications You must be signed in to change notification settings

KillerShoaib/RLM-From-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recursive Language Model From Scratch

RLM Diagram

This is an unofficial implementation of RLM paper.

Original Paper: Recursive Language Models by Alex L. Zhang, Tim Kraska & Omar Khattab

RLM in Action

RLM Example (1)

Table of Contents

What is RLM (Layman Language)

NO BS! Let's break it down into simple terms.

  • You start with an LLM (let's call it the root llm)
  • You give it access to a ptyhon repl environment (I'll discuss repl in details later)
  • You have a big dataset (let's say around 1 million+ tokens), you can't load that dataset into the root llm directly. So, you give the dataset path to the root llm.
  • The root llm actually writes a python code to load the dataset.
  • Then we pass the python code string to the REPL environment. Which executes the code and save it's state (remember jupyter notebook where each cells variable, dicts everythings are saved here is also same).
  • Then after saving the file, now you start asking question that can be found in the dataset.
  • Root LLM will again generate python code to start finding the answer from the dataset.
  • Here it will generate code which can
    • peek into the first x amount characters
    • use regex to find specific sections
    • split the dataset into chunks and analyze one by one iterating over loop
    • it can also delegate a specific chunk to a sub LLM to find a specific ans from the chunk
  • The Root LLM will recusively generate python code and use it to find answers until it finds the answer.
  • You see the root llm is never loading the entire text into the context, it is just utilizing the REPL environment by generating python code and seeing the output of that python code. That's it! That's the whole idea of RLM.

Isn't RLM is just tool calling with python in a Loop?

No.

  • RLM is treating the entire dataset as a variable.
  • RLM is using the REPL environment to interact with the dataset.
  • RLM utilizing code (symbolic language) to call sub_llm for initializing sub agents, never it is calling as a tool
  • RLM is only generating the code and seeing the output of that code (usually the output is truncated to avoid context rot).
  • RLM can bypass the output token limit in base LLM by using the FINAL function to deliver the final answer directly from the REPL environment. (usually tool response goes back to LLM, but here the final answer is delivered directly from the REPL environment.)
  • RLM can iteratively update it's answer without ever seeing the full answer (**This works due to recursion that RLM is aware of the changes).

Above are the reasons why RLM is not just a tool calling with python in a Loop.

One of the main author Omar Khattab from RLM paper explained this in detail in on this X thread

What is REPL Environment?

REPL is a short form for Read Eval Print Loop. It is a simple environment that allows you to execute python code and see the output of that code.

  • You generate a python code
  • You ran the python code
  • You see the output of the python code
  • & the loop continues...

So, we give access to the REPL environment with state to the LLM (here is the root llm) as tool. And the LLM utilizing the REPL environment to interact with the dataset without directly intracting with it. In the paper they mentioned the entire prompt (dataset) is treated as a variable. So, the root llm can access the dataset through the REPL environment.

To know more about it I would suggest you to look at the original paper. They discuss in detailed and also shows overall performance in various benchmarks.

Extra Additions In This Implementation

Sandbox Environment

The paper implementation uses python repl environment in the direct host machine which makes it super dangerous. Suppose if the LLM generates code to delete important file system or malicious code then it'd effect the host machine.

To solve this issue, I used dokcer container to run the repl environment. So, the repl environment is isolated from the host machine. In the run time I am mounting the dataset files to the docker image (alongside env) and then the repl is running inside the docker container. If there was a malicious code then it'd run in the docker container and won't effect the host machine.

Memory Compaction

In the original paper they didn't mention about memory compaction after each interaction. But while I was testing it I found:

  • In general the root llm was calling the repl environment multiple times to find the answer. Which was consuming a lot of token if we think to add all of this interaction through out the session

  • Therefore, I've implemented a simple interaction based memory compaction. After getting each response (final response) the entire interaction will be summarized into a single message. In this way if there is 10 repl env call then the next msg won't contain only the summary of what happened last interaction without bloating the context with unnecessary details.

Disclaimer & Warning

This project is an unofficial implementation based on my personal understanding of the RLM paper and various research materials available online.

  • Not an Exact Replica: While it follows the core principles of the paper, this implementation is not an exact mirror of the authors' original work.
  • Custom Additions: As mentioned in the Extra Additions section, I have introduced features like Docker sandboxing and Memory Compaction which are not present in the original paper.
  • Potential Mistakes: This is a "from-scratch" exploration; there may be bugs, architectural differences, or misunderstandings of specific paper details.

Use this as a learning resource or a starting point for your own RLM experiments.

Run Locally

Prerequisites

  • Docker (make sure it can be run without sudo)
  • uv for package management
  • python 3.12+
  • .env file with API keys (I've added a .env.example file for reference)

How to Run

1. Setup Environment

Copy the example environment file and fill in your API keys:

cp .env.example .env

Ensure you have the necessary keys (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.) depending on the models you choose in config.json.

2. Install Dependencies

Install the required Python packages using uv:

uv sync

3. Configuration (config.json)

The config.json file controls the LLMs and system behavior:

  • root_llm: The primary model that orchestrates the reasoning and writes Python code.
  • sub_llm: The model used for semantic analysis of text chunks (map operations).
  • glm_coding: (Optional) Use Zhipu AI's GLM coding subscription api directly. If it is enabled then the root_llm and sub_llm will be using the GLM models.
  • memory_compaction: When enabled, it summarizes long tool-call histories into a single context-saving summary after each interaction.

Example config.json:

{
  "root_llm": {
    "model": "openrouter/google/gemini-2.0-flash-001"
  },
  "sub_llm": {
    "model": "openrouter/google/gemini-2.0-flash-001"
  },
  "glm_coding": {
      "enabled": false,
      "api_base": "https://api.z.ai/api/coding/paas/v4",
      "root_model": "openai/glm-4.7",
      "sub_model": "openai/glm-4.7"
  },
  "memory_compaction": {
      "enabled": true
  }
}

4. CLI Commands

The project uses a Click-based CLI for interaction:

  • init <context_file>: (Optional) Pre-loads a large dataset/file into the REPL state.
  • chat: Starts an interactive session with the Root LLM.
  • run "<query>": Executes a single query and returns the answer.
  • status: Displays the current state of the REPL (loaded files, buffers, etc.).

5. Running the System

You can start by initializing a file, or just jump straight into a chat and let the LLM load files as needed.

Option A: Initialize first (Preferable for large files)

uv run main.py init /absolute/path/to/data/my_huge_dataset.txt
uv run main.py chat

Option B: Start chat directly

uv run main.py chat

Once the chat starts, you can simply say: "Load the file at /absolute/path/to/data/info.txt and tell me what's in it." The Root LLM will automatically generate the Python code to load and analyze it.

Option C: Run a single query

uv run main.py run "Analyze /absolute/path/to/data/logs.txt and find all error messages"

6. Managing Context

During a session, the Root LLM manages the context by writing Python code that utilizes helper functions within the REPL environment. You can simply ask the LLM to perform these actions using natural language, and it will use the following helpers:

Important

Use Absolute Paths: When requesting the LLM to load files, you must provide the absolute path (e.g., /home/user/data.txt). This is required so the system can correctly mount the file into the isolated Docker container.

  • load_file(path, name=None, replace_all=False): Used by the LLM to load a new file into memory.
    • Example Query: "Load the file at /home/user/data/logs.txt and call it 'logs'"
    • Example Query (Replace): "Replace your current context with the file at /home/user/new_data.txt"
  • load_files(paths_list): Used to batch-load multiple files.
    • Example Query: "Load these three files: /home/user/f1.txt, /home/user/f2.txt, and /home/user/f3.txt"
  • switch_to(name): Used to change which loaded file is "active" for analysis helpers like peek() or grep().
    • Example Query: "Switch to 'f2' and search for all 'ERROR' entries"
  • list_files(): Used by the LLM to see what is currently loaded in its sandbox memory.
    • Example Query: "What files do you have loaded right now?"
  • remove_file(name): Used to drop a specific file from the REPL state.
    • Example Query: "Remove the file 'logs' from your context"

7. REPL State

The REPL state (variables, loaded files, buffers) is persisted in the .rlm_state/ directory.

  • To start with a fresh state, simply delete the .rlm_state directory:
rm -rf .rlm_state

8. Chat Session

In the current implementation, the chat message history is stored in-memory. This means your conversation history will be lost when you close the program. However, since the REPL state is persisted on disk, any files loaded or variables defined in the REPL will remain available when you restart the program.

Codebase Overview

For a detailed understanding of how the project is structured, how the REPL sandbox works, and the inner workings of the Orchestrator, please refer to the Codebase Overview.

Acknowledgements & Gratitude

Building this from scratch wouldn't have been possible without the amazing resources and insights shared by the community. A huge thank you to all of these people:

Why Keeping Up with AI Is Exhausting?

Changelog

v1.0.0

  • Initial release of RLM from scratch implementation.

v1.0.1

  • Fixed the issue where the final answer was going back to root llm. Original paper implementation the final answer should be directly delivered from the REPL environment using FINAL function.