Solve Issues in Large Code Repositories

A Novel Approach to SWE Bench Optimization

Introduction

This repository contains the implementation and research artifacts for our final year project titled: "Solve Issues in Large Code Repositories"

The project addresses limitations in current automated debugging and patch generation methods by introducing a hybrid approach that combines:

Iterative reasoning, to mimic real-world developer behavior.
Graph-based retrieval, to reduce the search space and improve precision.
Retrieval-Augmented Generation (RAG) leveraging Stack Overflow for enhanced context.
Multi-LLM-based patch generation and refinement, ensuring higher SWE-bench performance with cost-effective computation.

Objectives

General Objective

Enhance the efficiency and accuracy of automated software engineering solutions evaluated using the SWE Bench framework.

Specific Objectives

Develop an iterative reasoning system for issue resolution.
Create a graph-based representation of code repositories for accurate file retrieval.
Integrate Stack Overflow knowledge using RAG to improve contextual understanding.
Combine multiple LLMs (e.g., Claude, GPT-4, DeepSeek R1) for diverse patch generation.
Learn from incorrect patches using iterative refinement and reasoning models.
Achieve retrieval accuracy >82% on SWE-bench tasks.

Methodology

Graph-Based Repository Modeling Built using NetworkX and visualized with Gephi, representing inter-file relationships like imports and function calls.
Retrieval-Augmented Generation (RAG) Enhanced contextual understanding using Stack Overflow data stored in ChromaDB, queried semantically via Sentence-BERT, and processed with LlamaIndex.
Iterative Reasoning & Multi-LLM Patch Generation Employing reasoning models like DeepSeek R1 and multiple LLMs to generate, compare, and refine patches.
Artificial Stack Trace Generation For difficult cases where standard retrieval fails, simulate execution paths using graph traversal to identify probable buggy files.

Technologies Used

Technology	Purpose
Python	Primary development language
NetworkX	Graph construction
Gephi	Graph visualization
ChromaDB	Vector database for semantic retrieval
OpenAI-embeddings	Embedding generation
Langchain	RAG integration and vector search
BeautifulSoup	Web scraping (Stack Overflow)
StackAPI	API access to Stack Overflow data
GPT-4, Claude	LLMs for retrieval and patch generation
DeepSeek R1	Reasoning and decision-making

Experiment Setup

Graph-based repository model construction.
SWE-bench dataset preprocessing.
Retrieval techniques benchmarked:
- LLM-based
- Embedding-based
- LLM + RAG
Artificial stack traces generated when direct retrieval fails.
Stack Overflow context integration using vector search.
Evaluation metrics:
- Retrieval Accuracy (target: >82%)
- Patch Validity (unit test pass/fail)
- Cost-efficiency (LLM token usage and execution time)

Results and Analysis (To be updated after implementation)

Expected Outcomes:

Improved retrieval accuracy compared to baseline agentless models.
More accurate and contextually relevant patch generation.
Reduced computation cost due to graph-pruned search space.
Iterative learning model able to refine patches across runs.

References

Contributors

Achsuthan T. – E/19/007 – e19007@eng.pdn.ac.lk
Eshan Jayasundara – E/19/163 – e19163@eng.pdn.ac.lk
Lahiru Menikdiwela – E/19/236 – e19236@eng.pdn.ac.lk
Supervisors:
- Prof. Roshan G. Ragel
- Dr. Damayanthi Herath

License

This repository is for academic and non-commercial research use only. Licensing options to be determined based on publication and university policy.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
code		code
docs		docs
README.md		README.md
my-changes.diff		my-changes.diff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Solve Issues in Large Code Repositories

A Novel Approach to SWE Bench Optimization

Introduction

Objectives

General Objective

Specific Objectives

Methodology

Technologies Used

Experiment Setup

Results and Analysis (To be updated after implementation)

References

Contributors

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

cepdnaclk/e19-4yp-Solve-Issues-In-Large-Code-Repositories

Folders and files

Latest commit

History

Repository files navigation

Solve Issues in Large Code Repositories

A Novel Approach to SWE Bench Optimization

Introduction

Objectives

General Objective

Specific Objectives

Methodology

Technologies Used

Experiment Setup

Results and Analysis (To be updated after implementation)

References

Contributors

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages