Deep Doc: Comparative Analysis of RAG and RLM Approaches

A systematic comparison of three document intelligence paradigms: traditional Retrieval-Augmented Generation (RAG), Google's File Search-based RAG, and Recursive Language Models (RLM).

Overview

This repository implements three distinct approaches for question-answering over long documents, evaluated on the Vectara Open RAG Benchmark (3,045 questions across 1,000 research papers).

Implemented Approaches

Approach	Architecture	Context Strategy	Model
Normal RAG	Chunking + Embedding + Top-K Retrieval	Pre-indexed chunks (384-dim vectors)	Qwen3-30B via OpenRouter
Google RAG	File Search + Grounding	Google-managed indexing	Gemini 2.5 Flash
RLM	REPL + Recursive Sub-LM Calls	Dynamic context mining	Qwen3-30B via OpenRouter

Motivation

Recent work on Recursive Language Models [1] has demonstrated a novel paradigm for processing long-context tasks without requiring massive context windows. However, there remains confusion in the community about the relationship between RLMs and traditional RAG systems. This implementation provides:

Direct comparison of RLM against established RAG baselines
Full transparency into each system's reasoning process (chain-of-thought traces)
Benchmark evaluation on realistic document QA tasks

Key Differences

Traditional RAG

Pre-indexes documents into fixed-size chunks
Retrieves top-K most similar chunks via embedding similarity
Single LLM call with retrieved context

Google RAG

Managed indexing via Google File Search
Grounding-based retrieval with source attribution
Single LLM call with grounded context

RLM (Recursive Language Model)

No pre-indexing — processes documents on-demand
Iterative reasoning via REPL environment (Python sandbox)
Multiple sub-LM calls to extract and verify information
Programmatic context navigation (search, filter, chunk dynamically)

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        User Query                            │
└─────────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        │                   │                   │
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│  Normal RAG   │   │  Google RAG   │   │      RLM      │
├───────────────┤   ├───────────────┤   ├───────────────┤
│ 1. Embed Q    │   │ 1. Upload PDF │   │ 1. Load doc   │
│ 2. Top-K      │   │ 2. File Search│   │    as variable│
│    retrieval  │   │    grounding  │   │ 2. LLM writes │
│ 3. LLM call   │   │ 3. LLM call   │   │    Python code│
│    (Qwen3)    │   │    (Gemini)   │   │ 3. Sub-LM     │
│               │   │               │   │    calls      │
│               │   │               │   │ 4. Iterate    │
│               │   │               │   │ 5. SUBMIT()   │
└───────────────┘   └───────────────┘   └───────────────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            ▼
                    ┌───────────────┐
                    │ Side-by-Side  │
                    │  Comparison   │
                    └───────────────┘

Installation

Prerequisites

Python 3.8+
Deno (required for RLM's sandboxed REPL)

Setup

Clone the repository

git clone https://github.com/yourusername/RLM-RAG.git
cd RLM-RAG

Install Python dependencies
```
pip install -r requirements.txt
```
Install Deno (for RLM sandbox)
```
curl -fsSL https://deno.land/install.sh | sh
```
Restart your shell after installation.

Configure API keys

cp .env.example .env

Edit .env with your credentials:

OPENROUTER_API_KEY=your_openrouter_key
GOOGLE_API_KEY=your_google_key
REDUCTO_API_KEY=your_reducto_key

Run the application
```
python app.py
```
Navigate to http://localhost:5000

API Keys

Service	Purpose	Obtain Key
OpenRouter	Normal RAG + RLM (Qwen3)	openrouter.ai
Google AI	Google RAG (Gemini)	ai.google.dev
Reducto	PDF parsing (OCR)	reducto.ai

Configuration

Model selection can be configured via environment variables:

QWEN_MODEL=qwen/qwen3-30b-a3b      # OpenRouter model for RAG + RLM
GOOGLE_MODEL=gemini-2.5-flash       # Google model for File Search

Project Structure

RLM-RAG/
├── app.py                      # Flask backend
├── pipelines/
│   ├── normal_rag.py           # Traditional RAG implementation
│   ├── google_rag.py           # Google File Search RAG
│   ├── rlm_pipeline.py         # DSPy RLM implementation
│   └── reducto_parser.py       # PDF parsing with Reducto OCR
├── benchmark/
│   ├── bench_loader.py         # Vectara Open RAG Bench loader
│   └── data/                   # Benchmark questions & answers
├── templates/
│   └── index.html              # Web interface
├── static/
│   ├── style.css               # UI styling
│   └── script.js               # Frontend logic
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md

Implementation Details

Normal RAG Pipeline

Document Processing
- PDF parsed via Reducto OCR (extracts text, tables, figures)
- Text chunked into ~1000-character segments (200-char overlap)
- Chunks embedded using sentence-transformers/all-MiniLM-L6-v2 (384-dim)
Query Processing
- Question embedded using same model
- Top-5 chunks retrieved via cosine similarity
- Context + question sent to Qwen3-30B

Google RAG Pipeline

Document Processing
- Raw PDF uploaded to Google File Search Store
- Google internally indexes and chunks the document
Query Processing
- Gemini 2.5 Flash uses file_search tool
- Retrieves relevant passages with grounding metadata
- Returns answer with source attribution

RLM Pipeline

Document Processing
- Full document text loaded as Python variable in REPL
- No pre-indexing or chunking
Query Processing
- LLM receives question + metadata (doc length, structure)
- LLM writes Python code to explore document:
  - peek(start, end) — view text slice
  - search(keyword) — find occurrences
  - llm_query(text, question) — spawn sub-LM call
- Iterates until sufficient information gathered
- Calls SUBMIT(answer) to finalize
Recursive Reasoning
- Main LLM never sees full document (avoids context overflow)
- Sub-LM calls process specific chunks
- Full trajectory (code + outputs) displayed for transparency

Benchmark

The application includes the Vectara Open RAG Benchmark [2]:

3,045 questions across 1,000 research papers (arXiv)
Question types: Abstractive (1,793), Extractive (1,252)
Source types: Text (1,914), Text+Image (763), Text+Table (148), Text+Table+Image (220)

Evaluation

Each pipeline can be evaluated on benchmark questions with:

Ground truth answers for comparison
LLM-as-Judge scoring (Claude Sonnet 4 via OpenRouter)
Metrics: Accuracy, Completeness, Relevance (1-10 scale)

References

[1] Zhang, A., Kraska, T., & Khattab, O. (2025). Recursive Language Models. arXiv preprint arXiv:2512.24601. https://arxiv.org/abs/2512.24601

[2] Vectara. (2024). Open RAG Bench: A Benchmark for Retrieval-Augmented Generation. https://github.com/vectara/open-rag-bench

[3] Khattab, O., et al. (2024). DSPy: Programming—not prompting—Foundation Models. https://dspy.ai

Citation

If you use this implementation in your research, please cite the original RLM paper:

@article{zhang2025recursive,
  title={Recursive Language Models},
  author={Zhang, Alex and Kraska, Tim and Khattab, Omar},
  journal={arXiv preprint arXiv:2512.24601},
  year={2025},
  url={https://arxiv.org/abs/2512.24601}
}

License

MIT License

Acknowledgments

DSPy team for the RLM implementation
Vectara for the Open RAG Benchmark
Reducto for PDF parsing infrastructure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Doc: Comparative Analysis of RAG and RLM Approaches

Overview

Implemented Approaches

Motivation

Key Differences

Traditional RAG

Google RAG

RLM (Recursive Language Model)

Architecture

Installation

Prerequisites

Setup

API Keys

Configuration

Project Structure

Implementation Details

Normal RAG Pipeline

Google RAG Pipeline

RLM Pipeline

Benchmark

Evaluation

References

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmark		benchmark
pipelines		pipelines
static		static
templates		templates
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

mohammed840/RLM-RAG

Folders and files

Latest commit

History

Repository files navigation

Deep Doc: Comparative Analysis of RAG and RLM Approaches

Overview

Implemented Approaches

Motivation

Key Differences

Traditional RAG

Google RAG

RLM (Recursive Language Model)

Architecture

Installation

Prerequisites

Setup

API Keys

Configuration

Project Structure

Implementation Details

Normal RAG Pipeline

Google RAG Pipeline

RLM Pipeline

Benchmark

Evaluation

References

Citation

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages