There are a number of approaches used for Fault Localization in potential bug files that use Information Retrieval (IR) methods. Common techniques are the BugLocator IR methods that utilize a ranking system based on direct and indirect linking of potential source file fixes. A well known technique such as BugLocator would be a relevant benchmark IR for comparison against Latent Semantic Indexing (LSI). By comparing evaluation metrics, we were able to analyze performance of these methods. The first approach was broken into two methods (methods 1 and 2) to facilitate a benchmark for the full implementation of BugLocator (method 2) and LSI (method 3).
All methods were trained and tested with the bug reports and source files of Java open source project packages. However, Python was used to pre-process the data, as well as create/train/test the models.
Overall there are three methods that were implemented and evaluated:
- Method 1: Simplified BugLocator
- Method 2: Full BugLocator
- Method 3: Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD)
The pre-processing code up to the Markdown heading "More Pre-processing (Team 7)" in the Jupyter notebook was provided by a course instructor.
Overall, method 2 showed the best performance based on Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) evaluation metric values. Visualization for these results are shown in the screenshots section of this readme document.
- Pre-processes bug reports (query results) and source files (query results) to train machine learning algorithms..
- Ranks source files (query results) related to a bug report (query) to find the location of bugs related to the bug report.
- NumPy style documentation for maintainability and clarity of application.
To prepare a dataset for the application to process, follow the "Getting Started" instructions here.
You must use Python 3 to run our notebook once the data has been processed as instructed in the aforementioned "Getting Started" section.
To run application, first install Jupyter Lab, then open a new console and enter:
jupyter lab
This will open a jupyter lab tab in your default browser, in which you can run the application.
Nicolas Mora | Connor Britton | Philip Rea | Joseph Park |