The AndroTest24 Study is the first comprehensive statistical study of existing Android GUI testing metrics. It involves extensive experiments with 3-hour, 10-repetition tests on 42 diverse apps across 8 representative state-of-the-art testing approaches from diverse categories with typical testing methodologies. It examines the statistical significance, correlation, and variation of the testing metrics while applying them for comparative evaluation.
For more details about our study, please refer to our ASE 2024 paper "Navigating Mobile Testing Evaluation: A Comprehensive Statistical Analysis of Android GUI Testing Metrics".
This repository provides the corresponding artifacts, including:
① The AndroTest24 App Benchmark, consisting of 42 active open-source apps that are achieved from the integration of more than ten previous open-source benchmarks.
② The Study Data of our study, organized by our Research Questions.
③ The SATE (Statistical Android Testing Evaluation) Framework, to promote effective statistical mobile testing evaluation.
A zip file containing all the 42 APK files of the AndroTest24 App Benchmark can be achieved from GoogleDrive.
The 42 apps and their sources are listed below:
(Note: Some app links may be later broken due to some issues, e.g., the project has stopped)
The original statistics tables of our study are provided under /Study_Data
.
They are organized according to our Research Questions and have been renamed for better understandability.
Monkey
- Tool: https://developer.android.com/studio/test/other-testing-tools/monkey
- Parameters:
--ignore-crashes --ignore-timeouts --ignore-security-exceptions -v --throttle 200
Stoat
- Paper: [ESEC/FSE’17] Guided, stochastic model-based GUI testing of Android apps
- Tool: https://github.com/tingsu/Stoat
APE
- Paper: [ICSE’19] Practical GUI Testing of Android Applications via Model Abstraction and Refinement
- Tool: https://github.com/tianxiaogu/ape
ComboDroid
- Paper: [ICSE’20] ComboDroid: Generating High-Quality Test Inputs for Android Apps via Use Case Combinations
- Tool: https://github.com/skull591/ComboDroid-Artifact
(4.1) Supervised-Learning-Based
Humanoid
- Paper: [ASE’19] Humanoid: A Deep Learning-based Approach to Automated Black-box Android App Testing
- Tool: https://github.com/yzygitzh/Humanoid
(4.2) Tabular-RL-Based
Q-testing
- Paper: [ISSTA’20] Reinforcement Learning Based Curiosity-Driven Testing of Android Applications
- Tool: https://github.com/anlalalu/Q-testing
(4.3) Deep-RL-Based
ARES
- Paper: [TOSEM’22] Deep Reinforcement Learning for Black-box Testing of Android Apps
- Tool: https://github.com/H2SO4T/ARES
DQT
- Paper: [ICSE’24] Deeply Reinforcing Android GUI Testing with Deep Reinforcement Learning
- Tool: https://github.com/Yuanhong-Lan/DQT
- Hardware: Google Pixel 2
- Resolution: 1080*1920
- Android Version: Android 9.0 (API Level 28)
- Google Sevice: Google APIs
- RAM: 4GB
- VM Heap: 2GB
- Internal Storage: 8GB
- SD Card: 1GB
The Statistical Android Testing Evaluation (SATE) Framework is proposed to empower and enhance future mobile testing evaluations. It is unique in its out-of-the-box functionality with rigorous statistical analysis for test metrics and efficient data management integration for large-scale, multi-type testing data.
The SATE framework is provided under /SATE
.
SATE
├── android_testing_utils/log Log helper.
├── constant Some constants loaded from config.yaml.
├── evaluation The main part of SATE.
│ ├── data_manager Data managing and scheduling.
│ └── result_analyzer Main analyzation.
│ ├── analysis Statistical analysis methods.
│ ├── excel Data.
│ ├── study_analyzer Analyzers tailored for our study.
│ └── utils Practical utils.
└── runtime_collection Some dependent data structures and test configs.
- Python: Tested on Python 3.7, recommended to build the Python project and environment under
/SATE/
to avoid import problems. - Requirements:
pip install -r /SATE/requirements.txt
- Sample raw data has been provided under
/SATE/evaluation/result_analyzer/excel/
. - Uncommented code in the main fields of our study analyzers under
/SATE/evaluation/result_analyzer/study_analyzer/
could be run directly. - Since there are some dependencies between data, it's recommended to run them in the following order:
- granularities_analyzer.py
- metrics_relation_analyzer.py
- randomness_analyzer.py
- convergence_analyzer.py
- Note: A test shell (
/SATE/test.sh
), which relies on thepython
cmd, is also provided for a quick run of the above process. Please run it inside the Python environment.