Agentic Code Search OSS

An open-source implementation of a low-latency agent for code localization.

Repository: https://github.com/All-Hands-AI/agentic-code-search-oss
Slack: #agentic-code-search-oss (All-Hands-AI workspace)

1. Problem Statement

LLM-based coding agents are bottlenecked by context retrieval. They are often slow and inefficient at finding the correct files and code snippets to edit in a large repository. This project builds a small, fast, specialized agent to solve the code localization problem.

2. Objective

The primary goal is to minimize the latency of code localization. The secondary goal is to maintain high precision.

Success will be measured by:

Latency: Time to identify target code locations.
Precision: Percentage of identified locations that are correct.
Recall: Percentage of all correct locations that were identified.

3. Technical Plan

The approach is to train a small language model using Reinforcement Learning (RL) on a standardized benchmark.

Benchmark Environment: SWE-Gym will be used for training and evaluation, as it provides realistic software engineering tasks with executable environments.
Reward Signal: The evaluation logic from the Agentless project will be used as the "verifiable reward" mechanism. The agent is rewarded for correctly identifying the files and lines that require edits.
RL Framework: The agent will be trained using an RL framework. SkyRL and AReaL are the primary candidates.
Model: A small, efficient language model (e.g., Qwen3-0.6B) will be fine-tuned for the localization task to ensure low inference latency.
Tooling Strategy: The agent will use a set of tools to navigate the codebase. The focus is on:
- Diverse Tool Calls: Implementing and evaluating tools beyond grep, such as Abstract Syntax Tree (AST) parsers for structural code analysis.
- Parallel Tool Calling: Architecting the agent to execute multiple search queries simultaneously to reduce the number of sequential steps.

4. Workstreams & Next Steps

The project is broken down into the following workstreams:

Workstream 1: Evaluation & RL Environment
- Task: Set up the core training environment by integrating the Agentless validator with SWE-Gym. This will provide the foundation for an RL training loop.
Workstream 2: Tooling
- Task: Research, implement, and evaluate different tool calls (e.g., AST-based search, advanced regex, semantic search).
- Task: Design and implement an architecture that supports parallel execution of these tools.
Workstream 3: Reinforcement Learning
- Task: Implement and run training loops using a selected RL framework (e.g., SkyRL, AReaL).
- Task: Experiment with reward shaping and policy optimization to improve agent performance.
Future Considerations:
- Investigating question-answering tasks using datasets like CodeSearchNet.
- Analyzing successful agent trajectories to improve learning.

5. Contribution

This is a community-driven project.

Join the #agentic-code-search-oss channel on the All-Hands-AI Slack.
Check the GitHub Issues for open tasks.
Attend the weekly meetings to sync on progress (details in the Slack channel).

6. Resources

Primary Inspiration: Cognition AI's SWE-grep Blog Post
Core Components:
Relevant Research & Projects:
Datasets:
- CodeSearchNet
- SWE-Fixer-Train-110K
Training Parallel Tool Calling:
- SWE-Grep: Forcing parallel tool calling during training (8 tools in parallel per step)
- LLMCompiler: Using a "compiler" idea to orchestrate parallel tool calling during training; could be an overall kill for just searching tasks.
- Divide-Then-Aggregate: Another similar training method for parallel tool calling.
- KAT: Some good practices for parallel tool calling.
- Overall, this space is relatively unexplored.
- Finally, this parallel tool calling thing is related to the idea of "multi-agent" framework:
  - M1-Parallel: runs multiple multi-agent teams in parallel
  - ToolFlow: multiple agents to synthesize the training data

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
agent-sdk @ f58cd8c		agent-sdk @ f58cd8c
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE.md		LICENSE.md
README.md		README.md
README_verifiers.md		README_verifiers.md
pyproject.toml		pyproject.toml
swe_grep_oss_env.py		swe_grep_oss_env.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic Code Search OSS

1. Problem Statement

2. Objective

3. Technical Plan

4. Workstreams & Next Steps

5. Contribution

6. Resources

About

Uh oh!

Releases

Packages

Contributors 7

Uh oh!

Languages

License

OpenHands/agentic-code-search-oss

Folders and files

Latest commit

History

Repository files navigation

Agentic Code Search OSS

1. Problem Statement

2. Objective

3. Technical Plan

4. Workstreams & Next Steps

5. Contribution

6. Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Languages

Packages