This repository contains the source code and methodology developed for the thesis: "A Software Framework to Investigate End-to-End Web Testing Adoption Dynamics".
The project implements a systematic workflow to investigate the motivations behind adoption and migration of web GUI testing frameworks (Selenium, Cypress, Puppeteer, Playwright) in open-source projects.
This software was designed to enrich E2EGit dataset with qualitative insights on why projects choose to adopt or migrate between web testing frameworks. Starting from a set of manually identified adoption and migration events (provided as input Excel files), the framework automates the retrieval of historical context to reconstruct the decision-making process.
The methodology, described in the thesis, follows a mixed-methods approach:
- Automated Context Extraction: Retrieving specific commit sequences and issue discussions surrounding the known transition dates.
- Noise Reduction: Applying a specialized keyword-based taxonomy to filter relevant content.
- Manual Classification: Classifying the motivations through a custom "Human-in-the-loop" GUI.
- Taxonomic Analysis: Categorizing the rationale (e.g., Developer Experience & Usability, Performance & Efficiency) to characterize the transition landscape.
The codebase is organized into modular components reflecting the research phases:
| Module | Description |
|---|---|
01_data_mining |
Retrieval & Filtering: Processes the input Excel files to fetch and filter commits/issues from GitHub/Git. |
02_manual_labeling |
Validation Interface: A Flask Web GUI for the manual review of the filtered candidates. |
03_db_integration |
Data Enrichment: Integrates the classified qualitative data back into the E2EGit dataset schema. |
04_statistical_analysis |
Characterization: Generates the empirical statistics and visualizations presented in the results. |
core |
Shared Logic: Database models, configuration, and utility functions. |
resources |
Data Storage: Contains the Input Excel Files (events), intermediate JSONs, and the SQLite database. |
IMPORTANT: This project is structured as a Python package.
All commands must be executed from the ROOT directory using the -m flag.
- ✅ Correct:
python -m module_name.cli ... - ❌ Incorrect:
cd module_name && python cli.py
For more details, refer to each module's help command:
python -m module_name.cli --helppython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activatepip install -r requirements.txtCreate a .env file in resources/ (use resources/.env.example as a template) to configure the DB path and GitHub Token.
To replicate the study, ensure the input files (creation-adoption-gui.xlsx, migration_analysis.xlsx) are present in the resources/ folder.
The system reads the input Excel files containing the transition events.
python -m 01_data_mining.cli --task retrieve --target commit --type adoption
python -m 01_data_mining.cli --task retrieve --target commit --type migrationpython -m 01_data_mining.cli --task retrieve --target issueIf you need to download issues only for a specific repository, add the --repo owner/repo argument
python -m 01_data_mining.cli --task retrieve --target issue --repo owner/repopython -m 01_data_mining.cli --task filter --target commit --type adoption
python -m 01_data_mining.cli --task filter --target commit --type migrationpython -m 01_data_mining.cli --task filter --target issue --type adoption
python -m 01_data_mining.cli --task filter --target issue --type migrationpython -m 02_manual_labeling.cli --task convert --target commit --type adoption
python -m 02_manual_labeling.cli --task convert --target commit --type migration
python -m 02_manual_labeling.cli --task convert --target issue --type adoption
python -m 02_manual_labeling.cli --task convert --target issue --type migrationLoad the manual labeling interface to classify the filtered candidates.
python -m 02_manual_labeling.cli --task server --target commit --type adoption
python -m 02_manual_labeling.cli --task server --target commit --type migration
python -m 02_manual_labeling.cli --task server --target issue --type adoption
python -m 02_manual_labeling.cli --task server --target issue --type migrationAfter manual review define the taxonomic labels in the Excel files and create a new Excel file containing transition events linked to their motivations and associated issue number/commit hash. You can find the files we used for the thesis in the resources folder.
Integrate the classified data back into the E2EGit dataset SQLite database.
python -m 03_db_integration.cliGenerate the statistics and visualizations for the final analysis.
python -m 04_statistical_analysis.cliIf you use this tool, please cite:
@mastersthesis{dimartino2025software,
author = {Giuseppe Di Martino},
title = {A Software Framework to Investigate End-to-End Web Testing Adoption Dynamics},
school = {University of Naples Federico II},
year = {2025},
type = {Bachelor's Thesis},
note = {B.Sc. Thesis in Computer Science}
}