This repository contains the source code of our technique and related experiments to identify similar test cases written in natural language. The technique first clusters test steps which are semantically similar and then uses those clusters to identify similar test cases.
To cluster similar test steps, we performed several experiments with the following text embedding techniques, text similarity metrics, and clustering algorithms:
Text embedding techniques
Text similarity metrics
- Word Mover’s Distance (WMD)
- Cosine score
Clustering algorithms
- Hierarchical Agglomerative Clustering
- K-means
To find similar test cases, we used the identified clusters of similar test steps to build and evaluate four different techniques.
The following directories contains the source code of all the approaches that were part of our experiments.
-
test-step-clustering: contains the notebooks with the source code for our test step clustering experiments.
-
test-case-similarity: contains the notebooks with the source code for our test case similarity experiments.
-
evaluations: contains the notebooks with the source code to evaluate all the approaches for test step clustering and techniques for test case similarity.
The following dependencies are required to run the notebooks on your local machine:
-
Python 3.7
-
pip install numpy
-
pip install pandas
-
pip install matplotlib
-
pip install scikit-learn
-
pip install gensim
-
pip install nltk
-
pip install torch
-
pip install transformers
-
pip install sentence-transformers
-
pip install tensorflow
-
pip install tensorflow-hub