This repository is a companion page for the following publication: Maggi, K., Verdecchia, R., Scommegna, L., Vicario, E., 2024. Claim: a Lightweight Approach to Identify Microservices in Dockerized Environments, in: 28th International Conference on Evaluation and Assessment in Software Engineering (EASE 2024)
Kevin Maggi, Roberto Verdecchia, Leonardo Scommegna, and Enrico Vicario. 2024. CLAIM: a Lightweight Approach to Identify Microservices in Dockerized Environments. In 28th International Conference on Evaluation and Assessment in Software Engineering (EASE '24), June 18–21, 2024, Salerno, Italy.
It contains an implementation of CLAIM tool and all the material required for replicating the study, including: implementation of CLAIM, scripts for defining datasets, script to conduct experiment, raw data and final results.
The scientific article describing design, execution, and main results of this study is available here.
If this study is helping your research, consider to cite it is as follows, thanks!
@inproceedings{maggi2024claim,
author = {Maggi, Kevin and Verdecchia, Roberto and Scommegna, Leonardo and Vicario, Enrico},
title = {{CLAIM: a Lightweight Approach to Identify Microservices in Dockerized Environments}},
booktitle = {Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering},
year = {2024},
publisher = {Association for Computing Machinery},
isbn = {9798400717017},
doi = {10.1145/3661167.3661206},
url = {https://dl.acm.org/doi/10.1145/3661167.3661206}
pages = {357-362},
numpages = {6},
location = {Salerno, Italy},
series = {EASE '24},
keywords = {Docker, Microservices, Repository Mining, Static Analysis}
}
Brief overview on CLAIM.
The main method is:
claim(repository name, repository base directory)
, which returns the set of identified microservices by calling:choose_dc(repository base directory)
, that chooses the right compose file;dc_collect_services(compose file, repository base directory)
, that extracts the list of Docker services;process_services(list of Docker services, repository base directory)
, that elaborates the selected compose in order to extract the images, builds and containers;determine_microservices(repository user, repository name, repository base directory, candidate microservices
, that determines the microservices.
Here the configuration parameters used by CLAIM:
-
filenames for compose file to be selected:
*docker-compose*.yml, *docker-compose*.yaml, *compose*.yml, *compose*.yaml
-
folder considered "neutral" for compose file to be accepted:
docker*, *compose, swarm, src, services, dev*, test*, staging, deploy*, integration, release, prod*, iac, saas, devops, setup*, script*, complete, etc
-
folder weights for compose file sorting (lexicographically with respect to path): same order as above, decreasing weight
-
affixes considered "neutral" for compose file to be accepted:
services, base, dev*, build*, stack, prod*, stable, deploy*, test*
-
affixes considered undesired for compose file to be discarded:
infra*, override
-
affixes weights for compose file to be chosen: same order as above, decreasing weight
-
filenames for Dockerfile to be selected:
*Dockerfile*
-
extensions for resulting files to be discarded (i.e. false positives Dockerfile):
.sh, .ps1, .nanowin, .txt
-
folder for Dockerfile to be discarded:
vendor, external, example, demo
-
extensions considered "configuration" when copied into filesystem of containers
.sh, .xml, .txt, .yaml, .yml, .sql, .conf, .config, .cnf, .cfg, .cf, .crt, .key
Brief documentation on how to use the replication material.
- Python 3.10
-
Clone the repo in the directory you want (we refer to it as
{CLONE_DIR}
):git clone * {CLONE_DIR}
-
Install all the Python package required:
pip install -r src/requirements.txt
-
Set GitHub token in
config.py
-
Run the compose file detection with CLAIM:
python src.A_dc_choice {"dataset_real"/"dataset_ground_truth"}
-
Run the compose file detection with Baresi et al.:
python src.A_dc_choice_Baresi {"dataset_real"/"dataset_ground_truth"}
-
Run the microservices identification with CLAIM:
python src.A_ms_detection {"dataset_real"/"dataset_ground_truth"}
-
Run the microservices identification with Baresi et al.:
python src.A_ms_detection_Baresi {"dataset_real"/"dataset_ground_truth"}
-
Run the profilation:
python scalene --memory --cpu --- -m src.B_profilation {repository} {"claim"/"baresi"}
This is the root directory of the repository. The directory is structured as follows:
CLAIM_rep-pkg
.
|
|
|--- src/ Source code used in the paper
| |
| |--- Baresi/ Scripts from Baresi et al.
| |
| |--- dataset_creation/ Scripts relating to steps of the creation of dataset with manually created ground truth
| |
| |--- claim.py Implementation of CLAIM
| |
| |--- config.py Configurations
| |
| |--- 0_* Datasets
| |
| |--- A_* Experiment
| |
| |--- B_* Profiling (time and memory)
| |
| |--- C_* Result plot
|
|--- data/ Data used in the paper
| |
| |--- dataset/ Dataset data
| |
| |--- results/ Experiment results
| |
| |--- analysis/ Data input for plotting and plots
The source code is licensed under the MIT license, which you can find in the LICENSE file.
All graphical/text assets are licensed under the Creative Commons Attribution 4.0 (CC BY 4.0).