This repository contains the implementation and research artifacts for our final year project titled: "Solve Issues in Large Code Repositories"
The project addresses limitations in current automated debugging and patch generation methods by introducing a hybrid approach that combines:
- Iterative reasoning, to mimic real-world developer behavior.
- Graph-based retrieval, to reduce the search space and improve precision.
- Retrieval-Augmented Generation (RAG) leveraging Stack Overflow for enhanced context.
- Multi-LLM-based patch generation and refinement, ensuring higher SWE-bench performance with cost-effective computation.
Enhance the efficiency and accuracy of automated software engineering solutions evaluated using the SWE Bench framework.
- Develop an iterative reasoning system for issue resolution.
- Create a graph-based representation of code repositories for accurate file retrieval.
- Integrate Stack Overflow knowledge using RAG to improve contextual understanding.
- Combine multiple LLMs (e.g., Claude, GPT-4, DeepSeek R1) for diverse patch generation.
- Learn from incorrect patches using iterative refinement and reasoning models.
- Achieve retrieval accuracy >82% on SWE-bench tasks.
-
Graph-Based Repository Modeling Built using NetworkX and visualized with Gephi, representing inter-file relationships like imports and function calls.
-
Retrieval-Augmented Generation (RAG) Enhanced contextual understanding using Stack Overflow data stored in ChromaDB, queried semantically via Sentence-BERT, and processed with LlamaIndex.
-
Iterative Reasoning & Multi-LLM Patch Generation Employing reasoning models like DeepSeek R1 and multiple LLMs to generate, compare, and refine patches.
-
Artificial Stack Trace Generation For difficult cases where standard retrieval fails, simulate execution paths using graph traversal to identify probable buggy files.
Technology | Purpose |
---|---|
Python | Primary development language |
NetworkX | Graph construction |
Gephi | Graph visualization |
ChromaDB | Vector database for semantic retrieval |
OpenAI-embeddings | Embedding generation |
Langchain | RAG integration and vector search |
BeautifulSoup | Web scraping (Stack Overflow) |
StackAPI | API access to Stack Overflow data |
GPT-4, Claude | LLMs for retrieval and patch generation |
DeepSeek R1 | Reasoning and decision-making |
-
Graph-based repository model construction.
-
SWE-bench dataset preprocessing.
-
Retrieval techniques benchmarked:
- LLM-based
- Embedding-based
- LLM + RAG
-
Artificial stack traces generated when direct retrieval fails.
-
Stack Overflow context integration using vector search.
-
Evaluation metrics:
- Retrieval Accuracy (target: >82%)
- Patch Validity (unit test pass/fail)
- Cost-efficiency (LLM token usage and execution time)
Expected Outcomes:
- Improved retrieval accuracy compared to baseline agentless models.
- More accurate and contextually relevant patch generation.
- Reduced computation cost due to graph-pruned search space.
- Iterative learning model able to refine patches across runs.
- SWE Bench
- Agentless
- SWE Agent
- AutoCodeRover
- SWE Search
- OpenHands
- RepoHyper
- RepoGraph
- StackRAG
- DeepSeek R1
- Evolving Deeper LLM Thinking
-
Achsuthan T. – E/19/007 – e19007@eng.pdn.ac.lk
-
Eshan Jayasundara – E/19/163 – e19163@eng.pdn.ac.lk
-
Lahiru Menikdiwela – E/19/236 – e19236@eng.pdn.ac.lk
-
Supervisors:
- Prof. Roshan G. Ragel
- Dr. Damayanthi Herath
This repository is for academic and non-commercial research use only. Licensing options to be determined based on publication and university policy.