A modular suite for benchmarking all stages of Machine Learning pipelines. To find bottlenecks in such pipelines and compare different ML tools, this framework can calculate and visualize several metrics in the data preparation, model training, model validation and inference stages.
Clone the current repository with the following command:
git clone git@github.com:hpides/End-to-end-ML-System-Benchmark.git
To use the package in a Python project, include it in the requirements.txt file.
That can be done with a path reference to the local repository. Include the following line in your requirements file, with your local path.
-e <PATH_TO_REPOSITORY>/umlaut/
Or, through pip:
pip install -e <PATH_TO_REPOSITORY>/umlaut/
Alternatively you can run umlaut or umplaut + daphne in a docker container. You can find them in /containers.
Only Umlaut
Run
sudo docker build -t umlaut containers/only_umlaut
to build a container and start it by running
bash containers/only_umlaut/start.sh
The Container only installs this repository.
Umlaut + Daphne
Run
sudo docker build -t umlaut_cpu containers/umlaut_daphne
to build a container and start it by running
bash containers/umlaut_daphne/start.sh
The Container builds the newest daphne version from source. This might take a while. You can alternatively uncomment the lines from the Dockerfile to download a daphne binary.
Umlaut + Daphne + CUDA
Run
sudo docker build -t umlaut_cuda containers/umlaut_daphne_cuda
to build a container and start it by running
bash containers/umlaut_daphne_cuda/start.sh
The Container contains builds the dnn-ops branch from daphne.
Upon installation, UMLAUT can be imported in any Python pipeline. The complete example pipeline can be found in ./pipelines/github_example/main.py. To import UMLAUT, use the following import statement in the Python script.
import umlaut
To intialize a benchmark, initialize an instance of the Benchmark class. It requires two string parameters, db_file, and description (optional). The metrics are listed in a dictionary that will then be used by the benchmark class.
import time
import numpy as np
from umlaut import Benchmark, BenchmarkSupervisor, MemoryMetric, CPUMetric
bm = Benchmark('sample_db_file.db', description="Database for the Github sample measurements")
bloat_metrics = {
"memory": MemoryMetric('bloat memory', interval=0.1),
"cpu": CPUMetric('bloat cpu', interval=0.1)
}
To benchmark a method, we attach a decorator (BenchmarkSupervisor) providing the metrics and the benchmark class. After the completion of the method, the Benchmark needs to be closed.
@BenchmarkSupervisor(bloat_metrics.values(), bm)
def bloat():
a = []
for i in range(1, 2):
a.append(np.random.randn(*([10] * i)))
time.sleep(5)
print(a)
def main():
bloat()
bm.close()
if __name__ == "__main__":
main()
You can run your custom pipeline by providing the path to you python file. You can specify the different kinds of measurements.
python pipelines/custom_pipeline/run_script.py --cmd "your command" -folder "path/to/your/script" -g -gm -gt -gp -t -c -m
Measurements are accessed through UMLAUT's CLI tool. It can be invoked from a bash terminal with the following command.
umlaut-cli <db_file>
To read through the measurements from the sample_db_file.db database, we insert the db_file name in the command.
cd pipelines/github_example
umlaut-cli sample_db_file.db
For detailed descriptions of all avaiable arguments and flags, call the help command for umlaut-cli.
umlaut-cli --help
UMLAUT collects measurements of the following metrics:
- Time spent
- Memory usage
- GPU Memory usage
- GPU utilization
- GPU power consumtion
- Loss (single run and multiple runs)
- Influence of batch size and #epochs
- Influence of learning rate
- Time to Accuracy (single run and multiple runs)
- Power usage
- Multiclass Confusion Matrix
- Standard metrics as accuracy, F1, TP/TN etc.
- Latency
- Throughput
Through the CLI tool, the measurements for each of the metrics can be visualized. For each pipeline, users can generate plots for one or more metrics.
Measurements for the same metric for multiple pipelines can be shown on a single plot. Examples of using the CLI toolkit for visualization are shown below.
To reproduce the following plots, use the ./github_example/hello_word.db.
umlaut-cli hello_world.db -p plotly
Run umlaut-cli and use plotly as plotting backend.
Select an UUID using space and the arrow keys.
Select one or more metrics using space and the arrow keys.
Select one or more descriptions using space and the arrow keys. The description or a measurement is usually the method name.
umlaut-cli hello_world.db -p plotly
We run again umlaut-cli and use plotly as plotting backend. This time we select mutiple UUIDs using space and the arrow keys.
In the pipelines folder, there are several examples of the following pipelines where UMLAUT is integrated.
-
So2Sat Earth Observation [description] [umlaut pipeline]
-
Backblaze Hard Drive Anomaly Prediction [description] [umlaut pipeline]
-
Stock Market Prediction [description] [umlaut pipeline]
-
MNIST Digit Recognition [description] [umlaut pipeline]
-
Meta Benchmarking Pipeline for initial testing:
By running the provided sh files, a set of operations (sleeping, sorting, matrix multiplication) can be run to test Umlaut on your own system. Furthermore the provided python file can be run for customized testing with the following arguments:
-t / --time to activate runtime measurements
-m / --memory to activate memory measurements
-mf / --memoryfreq to specify the interval for memory measurements
-c / --cpu to activate cpu measurements
-cf / --cpufreq to specify the interval for cpu measurements\
-o / --order to specify which operations to run ("sleep", "sort", "mult", "vw", in any order and as often as desired)
-r / --repeat to specify how often the set of operations should be repeated
-g / --gpu to activate gpu utilization measurement
-gm / --gpumemory to activate gpu memory measurement
-gt / --gputime to activate gpu time measurement. There might be slight differences between the cpu time and gpu time for code executed on a gpu.
-gp / --gpupower to activate gpu power consumption measurement.\
Umlaut should have a memory overhead of ~130 MB, a CPU usage of 10-20% when idle and close to no time overhead. When sorting, memory usage should have a mean and max of within 1000-1100 MB. When matrix multiplying, CPU usage should have a mean of ~90%.