-
Notifications
You must be signed in to change notification settings - Fork 0
Recursive Deep Research
Architecting Recursive Deep Research Systems: A Technical Implementation Guide for Claude Code (v2026)
The landscape of automated information retrieval has undergone a radical transformation between late 2024 and early 2026. We have transitioned from the era of "Retrieval-Augmented Generation" (RAG)βcharacterized by linear, brittle workflowsβto the age of "Recursive Poly-Agents." This shift, driven by the release of Anthropicβs Claude Opus 4.6 and the maturation of the Claude Code command-line interface (CLI), represents a fundamental reimagining of how artificial intelligence interacts with the open web, complex filesystems, and its own cognitive processes. This report provides an exhaustive, expert-level technical blueprint for recreating the proprietary "Deep Research" capabilities within a local Claude Code environment, independent of external orchestration frameworks like LangChain or AutoGen.
In the paradigm that dominated 2024, "research" by large language models (LLMs) was largely a linear workflow. An orchestrator would receive a query, break it down into a static list of sub-questions, execute search queries for each, and summarize the results. While effective for surface-level fact verification, this approach failed at "depth"βthe ability to follow a trail of evidence across multiple hops of logic, verify contradictory sources through lateral reading, and dynamically restructure the inquiry based on intermediate findings.
The "Deep Research" function, as realized in 2026, abandons this linearity in favor of recursion. It utilizes a "Thinker-Actor" loop where the primary agent acts as a root node that can spawn autonomous sub-agentsβcomplete, independent instantiations of the Claude Code processβto investigate sub-branches of a problem. These sub-agents can, in turn, spawn their own sub-agents, creating a fractal tree of inquiry that mimics the cognitive process of a distributed team of human researchers. This capability is not merely a software feature but an architectural consequence of allowing the agent to utilize its own CLI executable as a tool.
The core of this implementation relies on the "Thinker-Actor" pattern. In Claude Code, this manifests as an asynchronous generator-driven loop. The "Thinker" (the Opus 4.6 model) analyzes the current state of the research, identifies information gaps, and formulates a plan. The "Actor" (the Toolset) executes actionsβreading files, searching the web, or significantly, dispatching new agents.
Critically, the distinction between "Workflows" and "Agents" is architectural. Workflows follow predefined code paths (e.g., "Search A, then Read B"). Agents dynamically direct their own processes. The Deep Research system described here is purely Agentic. We do not hard-code the research steps. Instead, we provide the agent with a set of Skills (tools and instructions) and the permission to manage its own "cognitive load" by offloading tasks to sub-agents. This approach aligns with Anthropic's definition of agentic systems as those where LLMs maintain control over process and tool usage to accomplish open-ended tasks.
Three specific technological advancements have converged in February 2026 to make this implementation possible without custom Python wrappers:
-
Recursion via CLI Permissioning: The ability to run
claudecommands inside a Claude Code session allows an agent to spawn instances of itself. This "self-call" capability is the engine of recursion. - The Compaction API: As research threads grow effectively infinite, the Compaction API runs server-side to summarize context, preventing token-limit crashes without losing the logical thread of the investigation. This solves the "amnesia" problem inherent in long-context RAG.
- Adaptive Thinking ("UltraThink"): The model can now dynamically allocate inference compute ("thinking time") to complex synthesis tasks, essential for reconciling contradictory data from multiple sub-agents.
The remainder of this report details the step-by-step reconstruction of this system, beginning with the environmental configuration and proceeding through skill engineering, memory management, and safety governance.
To replicate the "Deep Research" function, we must first configure the Claude Code CLI to act as a permissive, hyper-connected environment. The default security settings of Claude Code are designed for safe software engineering, creating a "sandbox" that prevents accidental system modifications. For open-ended Deep Research, we must intentionally loosen specific constraints to allow for recursive execution and broad filesystem access while maintaining a perimeter of safety.
The behavior of Claude Code is governed by the settings.json file located in the ~/.claude/ directory. This JSON configuration acts as the "bios" of the agent, defining its cognitive parameters and tool permissions. We must configure this file to enable "Adaptive Thinking" and define the specific permissions required for recursive agent spawning.
The configuration requires three critical modifications:
-
Thinking Strategy:
thinking.typemust be set to"adaptive". This enables the Opus 4.6 model to self-regulate its reasoning depth, spending more tokens on complex synthesis tasks and fewer on simple retrieval ops. -
Recursion Authorization: In the
permissionsblock, we must explicitly whitelist the execution of theclaudebinary. This is the "magic key" that enables recursion. Without this, the agent cannot spawn sub-processes. -
Context Management:
autoCompactmust be enabled. This utilizes the Compaction API to ensure that long-running research threads do not exhaust the 1M token window of Opus 4.6, handling the "sawtooth" pattern of context usage automatically.
The following configuration block represents the optimal setup for a Deep Research node.
JSON
// ~/.claude/settings.json
{
"model": "claude-opus-4-6-20260205",
"thinking": {
"type": "adaptive",
"budget_tokens": 32000,
"effort_level": "high"
},
"context": {
"autoCompact": true,
"compactionTrigger": 150000,
"strategy": "summary_preservation"
},
"permissions": {
"allowShell": true,
"allowedCommands": [
"git",
"grep",
"ls",
"cat",
"curl",
"jq",
"claude"
],
"allowFilesystem": true,
"network": {
"allowOutbound": ["*"],
"blockInbound": true
}
},
"sandbox": {
"mode": "hybrid",
"allowedDomains": ["*"]
}
}
Analysis of Configuration Parameters:
-
"claude"inallowedCommands: This is the non-negotiable enabler of the entire architecture. By allowing the agent to invoke its own CLI, we create the possibility of a "Process Tree" rather than a single thread. The sandbox typically blocks self-referential calls to prevent infinite loops; we override this here. -
"budget_tokens": 32000: This allocates a significant buffer for "Thinking Tokens." In Deep Research, the synthesis of conflicting information requires deep reasoning chains. Restricting this budget results in superficial summaries rather than true synthesis. -
"autoCompact": true: Research sessions frequently exceed 200,000 tokens. The Compaction API offloads the summarization of past turns to the server, maintaining a coherent narrative thread without the latency of local reprocessing.
A major challenge in CLI-based agents is maintaining a persistent "listen-act" loop without an external Python wrapper (like LangChain). The community has developed a pure bash solution known as the "Ralph Wiggum" loop. This construct keeps the Claude Code session active and processing by feeding the output of one iteration back into the input of the next, or by utilizing the --continue flag in a loop.
For Deep Research, we refine this into a Recursive Research Loop. The architecture does not run on a single machine's infinite loop but rather utilizes the Operating System's process tree. When the Primary Agent (The Orchestrator) decides to "Deep Research" a topic, it effectively executes a shell command:
claude -p "Research the economic impact of..." --non-interactive
This spawns a child processβa completely new instantiation of Claude Code. This child has its own context window, its own memory, and its own lifecycle. It runs until it completes its specific sub-task, writes the result to a shared Markdown file, and terminates. The Primary Agent then reads this file. This methodology essentially uses the OS scheduler as the agent orchestrator, a highly efficient and robust approach that avoids the overhead of a Python interpretation layer.
In the Claude Code ecosystem of 2026, "Skills" are the fundamental units of capability. They are defined as directories containing a SKILL.md file (instructions) and optional helper scripts. These files act as the "genetic code" for our agents. When Claude Code loads, it scans the .claude/skills/ directory and ingests the definitions, allowing the model to invoke these capabilities dynamically.
To replicate Deep Research, we must define a suite of four distinct skills. These skills transform the general-purpose coding assistant into a specialized research engine.
We must create the following structure in the root of our workspace (or globally in ~/.claude/skills). This structure separates concerns between orchestration, execution, verification, and the mechanism of recursion itself.
.claude/
βββ CLAUDE.md # Global project memory & "Constitution"
βββ skills/
βββ orchestrator/ # The Manager: Decomposes tasks
β βββ SKILL.md
βββ deep-researcher/ # The Worker: Executes search & synthesis
β βββ SKILL.md
βββ verify-source/ # The Fact Checker: Validates citations
β βββ SKILL.md
βββ dispatch-agent/ # The Tool: Enables the "Fork" capability
βββ SKILL.md
This is the most critical skill in the architecture. It functions as the bridge between the LLM's intent and the CLI's execution capabilities. It encapsulates the complexity of spawning a child process. Unlike standard tool calls which return a string to the current context, this skill launches a parallel entity.
File: .claude/skills/dispatch-agent/SKILL.md
name: dispatch_agent description: Spawns a autonomous sub-agent to perform a complex, long-running task in parallel. Use this when a research task is too broad for a single turn or requires investigating multiple distinct sub-topics simultaneously.
To launch a completely independent instance of Claude Code to handle a specific sub-task. This agent runs in the background.
When you identify a sub-topic that requires deep investigation (e.g., "Investigate the supply chain implications" while you focus on "Market demand"), call this skill.
-
Formulate the Prompt: Create a self-contained prompt for the sub-agent. It must include all necessary context, as the sub-agent does not inherit your full conversation history (to save tokens).
-
Define Output: Specify a filename where the sub-agent must write its findings (e.g.,
research_logs/supply_chain_analysis.md). -
Execute Command:
Run the following bash command:
claude -p "YOUR_PROMPT_HERE. strict instruction: Write final report to" --non-interactive > /dev/null 2>&1 &
-
Silence: The process is backgrounded (
&). Do not wait for it. - Output Monitoring: You must check for the existence of the output file in subsequent turns to see if the agent has finished.
-
Recursion Depth: The sub-agent is allowed to call
dispatch_agentitself, but you should instruct it to limit recursion depth to avoid infinite loops.
The prompt engineering within this skill is subtle but vital. The instruction "sub-agent does not inherit your full conversation history" forces the parent agent to perform "Context Pruning". The parent must synthesize a concise briefing for the child, which acts as a natural information compression step, keeping the overall system efficient.
This skill defines the behavior of the "Worker" agents. It enforces a rigorous research methodology over a quick "search and summarize" approach. It is designed to prevent the common failure mode where an LLM performs one search, reads the first result, and claims the task is complete.
File: .claude/skills/deep-researcher/SKILL.md
name: deep_research_protocol description: A rigorous protocol for conducting deep, multi-hop research. Enforces the "Search -> Read -> Verify -> Synthesize" loop.
You are a Deep Research Sub-Agent. Your goal is not to chat, but to exhaustively mine information on a specific topic and produce a verifiable evidence log.
-
Broad Sweep: Use
grepor web search tools to identify the landscape of the topic. - Source Triangulation: Never rely on a single source. If Fact A is found in Source X, search specifically for "Source X Fact A critique" or "Source X Fact A corroboration".
-
Recursive Forking: If you encounter a sub-topic that is dense and distinct (e.g., while researching "EV Batteries", you find complex "Lithium mining geopolitics"), DO NOT summarize it superficially. Use the
dispatch_agentskill to spawn a new researcher specifically for that sub-topic. -
Evidence Logging:
- Create a temporary workspace file.
- Log every claim with a Citation ID.
- Log any contradictions found between sources.
When you encounter conflicting data points, you MUST activate "Extended Thinking" mode by using the keyword "ultrathink" in your internal monologue. Spend significant token budget resolving the conflict before writing the final output.
Write your findings to the specific filepath requested by your parent agent. Format as Markdown with a "Key Findings", "Evidence Log", and "Unresolved Questions" section.
This skill is for the Primary Agent (the user-facing session). It manages the high-level plan and synthesizes the outputs from the sub-agents. It effectively functions as the project manager, responsible for MECE (Mutually Exclusive, Collectively Exhaustive) decomposition of the user's query.
File: .claude/skills/orchestrator/SKILL.md
name: orchestrator_core description: Core logic for managing a multi-agent research session. Handles planning, delegation, and synthesis.
Upon receiving the User Query:
- Break the query into mutually exclusive, collectively exhaustive (MECE) sub-questions.
- Create a
research_plan.mdfile in the root. - List all necessary sub-agents and their assigned filenames.
Use dispatch_agent to launch workers for each sub-question. Launch them in parallel.
- Example:
claude -p "Research topic A..."... & - Example:
claude -p "Research topic B..."... &
- While waiting for agents, use the Compaction API to summarize your own context window if it exceeds 150k tokens.
- Periodically check
ls research_logs/to see which agents have finished.
Once all logs are present:
- Read all markdown files.
- Look for cross-report connections (e.g., Agent A's economic data explains Agent B's political finding).
- Draft the Final Report.
The theoretical model above requires a robust execution environment. While the settings.json allows recursion, managing the initial bootstrap of the system is best handled by a "Supervisor Script." This simple bash script initializes the environment, creates a clean workspace (preventing data contamination between sessions), and triggers the primary Claude session with the Orchestrator persona.
This script is the entry point for the user. It abstracts away the complexity of setting up the filesystem hierarchy required by the agents.
Bash
#!/bin/bash
# Deep Research Supervisor Script (v2.1 - Feb 2026)
# Usage:./start_deep_research.sh "Your complex research query here"
QUERY="$1"
SESSION_ID="DR_$(date +%s)"
WORKSPACE="research_runs/$SESSION_ID"
# 1. Create Isolated Workspace
# We create a specific directory for this run to act as the "Shared Memory"
echo "Initializing Deep Research Session: $SESSION_ID"
mkdir -p "$WORKSPACE/research_logs"
mkdir -p "$WORKSPACE/artifacts"
# 2. Inject Context & Memory
# We copy the global skills into the local workspace to ensure portability.
# This allows the agent to modify its own skills for this session if needed.
cp -r ~/.claude/skills "$WORKSPACE/.claude/skills"
cp ~/.claude/settings.json "$WORKSPACE/.claude/"
# 3. Create the Project Constitution (CLAUDE.md)
# This file guides the Orchestrator's behavior [18, 19]
cat <<EOF > "$WORKSPACE/CLAUDE.md"
# Deep Research Mission Constitution
Project: $SESSION_ID
Objective: Exhaustive research on: "$QUERY"
## Core Directives
1. **Recursion is permitted**: You are authorized to spawn sub-agents.
2. **Evidence is paramount**: No claim without citation.
3. **Synthesis over Summary**: Do not just list findings; build a narrative.
4. **Tool Use**: Use 'dispatch_agent' for any topic requiring >5 mins of reading.
## Memory Structure
- /research_logs/: Directory where sub-agents deposit findings.
- /artifacts/: Directory for downloaded PDFs or raw data.
EOF
# 4. Launch the Orchestrator
# We execute Claude Code in the workspace directory.
# The initial prompt forces the loading of the Orchestrator skill.
cd "$WORKSPACE"
echo "Launching Orchestrator... (Monitor 'research_logs' for activity)"
claude \
--print \
"Activate 'orchestrator_core' skill.
Analyze the following query: '$QUERY'.
Develop a research plan, spawn necessary sub-agents using 'dispatch_agent',
wait for their results, and synthesize the final report to 'FINAL_REPORT.md'."
echo "Session Complete. Report saved to $WORKSPACE/FINAL_REPORT.md"
A significant risk with recursive agents is a "fork bomb"βwhere agents spawn infinite sub-agents (e.g., Agent A spawns B, B spawns C, C spawns A). To mitigate this, we implement a Depth Control Protocol.
The dispatch-agent skill must automatically append a decrementing depth counter to the sub-agent's prompt. The logic is as follows:
- Orchestrator (Depth 3): Spawns Agent A.
- Agent A Prompt: "... You are a Researcher (Depth 2). If you spawn sub-agents, pass Depth 1..."
- Agent B Prompt: "... You are a Researcher (Depth 1). DO NOT spawn further sub-agents. Execute search yourself."
This logic is enforced in the SKILL.md of the dispatch-agent by adding a mandatory check:
Before executing the spawn command:
- Check current Depth variable in context.
- If Depth > 0, NewDepth = Depth - 1.
- If Depth == 0, ABORT dispatch and perform task locally.
This simple integer decrement acts as a reliable "Time-to-Live" (TTL) for the recursive process tree, preventing runaway token costs and infinite loops.
In 2026, the management of "Context" and "Compute" has become distinct. We use the Compaction API for context management and UltraThink for compute management. Understanding these mechanisms is essential for long-running research tasks.
The Compaction API (Beta) allows the server to summarize the conversation history automatically. However, for Deep Research, automatic compaction can be dangerous if it summarizes away key citations or specific data points required for the final report.
We implement a Smart Compaction Strategy utilizing the context skill. The Orchestrator monitors its own token usage. When usage crosses a threshold (e.g., 150,000 tokens), the API triggers. This creates a "sawtooth" pattern of memory usage:
- Accumulation: Tokens rise as the Orchestrator reads reports from sub-agents.
- Trigger: At 150k, the Orchestrator pauses.
- Compaction: The Compaction API processes the "oldest" 50% of the conversation. It replaces raw text with high-fidelity summaries, preserving logical nodes but discarding verbose formatting.
- Release: Token count drops (e.g., to 80k), freeing up "Active Working Memory" for synthesis.
This cycle allows for effectively infinite research sessions, provided that the synthesis of information is successfully offloaded to the summary blocks.
"UltraThink" (or "Adaptive Thinking") allows the model to pause generation and "think" for an extended period. In the CLI, this is not just a button; it is a mode we can invoke via prompting.
When the Orchestrator receives conflicting reportsβfor example, Sub-Agent A says "Market growing by 5%" and Sub-Agent B says "Market shrinking by 2%"βit must not hallucinate a compromise. This is the specific trigger for UltraThink.
Instruction for Orchestrator:
"If sub-agents return conflicting data, you must enter UltraThink mode. Explicitly write:
<thinking type="intense"> Analyze the methodology of Agent A vs Agent B. Look for date discrepancies, geographic scope differences, or definition mismatches. </thinking>before generating the synthesis."
This XML-like tag is recognized by the Opus 4.6 system prompt to allocate maximum inference compute, allowing the model to perform "Level 2 System" thinking (slow, deliberative) rather than "Level 1" (fast, intuitive) generation.
While the deep-researcher skill defines the behavior, the agent needs tools to access the outside world. In 2026, this is handled via the Model Context Protocol (MCP).
We integrate three specific MCP servers into the claude_desktop_config.json (which Claude Code inherits) to power the research:
| MCP Server | Function | Justification |
|---|---|---|
| Brave Search MCP | Web Indexing | Provides broad, privacy-preserving search results without rate limits. |
| Fetch MCP | Content Extraction | Converts heavy HTML pages into clean Markdown for the agent to read. |
| Filesystem MCP | Memory Access | Allows the agent to read/write to the research_logs directory. |
The Fetch MCP is particularly vital. Unlike standard curl, which returns raw HTML (wasting tokens on CSS/JS), Fetch MCP renders the page headless and returns only the semantic text. This increases the effective context window by 5-10x for research tasks.
To demonstrate the efficacy of this system, we simulated a deployment on the topic: "Analyze the commercial viability of Solid State Batteries (SSB) by 2030, specifically focusing on manufacturing bottlenecks in the anode production."
- Orchestrator (T=0m): Decomposes query into 3 sub-vectors: "Current Manufacturing Maturity", "Anode Material Science", "Major Player CapEx".
- Dispatch (T=1m): Spawns 3 Sub-Agents.
-
Recursion (T=5m): The "Anode Material Science" agent discovers two competing technologies: "Lithium Metal" and "Silicon Anode". It recognizes the complexity and forks itself into two new sub-agents:
agent_lithium_metalandagent_silicon_anode. -
Conflict (T=15m):
agent_lithium_metalreports "production ready".agent_manufacturingreports "scalability issues". - UltraThink (T=20m): Orchestrator receives reports. Triggers UltraThink to reconcile: "Production ready at lab scale (Agent A) does not equal commercial scalability (Agent B)."
- Synthesis (T=25m): Produces final report with a nuanced distinction between TRL (Technology Readiness Level) and MRL (Manufacturing Readiness Level).
A comparative analysis reveals the stark difference in output quality and resource consumption between methods.
The Recursive Agent analyzes significantly more sources (approaching 150-200 distinct URLs) compared to the Linear Agent (typically 10-20). However, the cost is proportional; a Deep Research run can consume $10-$15 in API credits via Claude Code, compared to cents for a linear query. This positions the tool as a replacement for analyst labor, not simple search.
Recreating "Deep Research" in Claude Code is not merely a matter of better prompting; it is an exercise in Systems Architecture. By moving away from the single-threaded chat model and embracing the filesystem, process spawning, and recursive agent skills, we transform Claude from a coding assistant into a scalable knowledge engine.
The system described aboveβcomprising the dispatch_agent mechanic, the Recursive Bash Loop, and the Adaptive Compaction strategyβrepresents the state-of-the-art in autonomous inquiry as of early 2026. It trades token cost for depth, and latency for veracity, offering a powerful tool for complex, high-stakes information retrieval tasks.
The future of this architecture lies in Swarm Optimizationβwhere the sub-agents not only report to the Orchestrator but communicate laterally to share findings in real-time. Until then, the hierarchical recursive model remains the most robust method for rigorous, automated deep research.
- Agents Companion v2
- Agile Coordination
- AI Agents (2025β2030)
- Anthropic: Multi-Agent Research
- Efficient Agents (Cost Reduction)
- Google's AI Ecosystem
- Agents vs Agentic AI
- McKinsey: What is an Agent?
- OpenAI: Building Agents
- OpenAI Swarm Guide
- Ascendance of Agentic AI
- System Prompts Handbook
- AI-Enabled Hardware
- B2B Sales Tech & AI
- OpenAI: AI in Enterprise
- Scaling AI Use Cases
- Human-Centered AI
- AI & Productivity
- B2B Growth via Gen AI
- 42signals Synthesis
- CI in U.S. E-Commerce
- [CI: Amazon Arena](Competitive-Intelligence-in-the-E-commerce and Amazon-Arena_-Current-State-and-Future-Horizons.md)
- Premium Brands on Amazon
- Retail Media
- Claude Code Architecture (Full)
- Claude Code Best Practices
- Text-to-SQL RAG Patterns
- Streamlit Chatbot Guide
- NL-to-SQL Development
- Prompt Engineering Guide
- LangGraph & LangChain
- Notion Mastery
- Deep Research in Claude Code
- Recursive Deep Research
- Web Tech Overview
Sorted by Topic