pytorch-benchmarks
is the benchmark library of the eml branch of the EDA group within Politecnico di Torino.
The library is entirely written in pytorch and addresses the training of DNNs models on edge-relevant use-cases.
In its latest release, the library currently includes the following benchmarks:
- Image Classification on the CIFAR10 dataset.
- Keyword Spotting on the Google Speech Commands v2 dataset.
- Visual Wake Words on the MSCOCO dataset.
- Anomaly Detection on the ToyADMOS dataset.
- Heart Rate Detection on the PPG-DALIA dataset.
- TinyImageNet on the omonimous dataset.
- Gesture Recognition on the NinaProDB6 dataset.
- Image Classification - ViT for Vision Transformers on the CIFAR10 and Tiny-ImageNet datasets.
- InfraRed Person Counting on the LINAIGE dataset.
N.B., tasks from 1. to 4. represent our in-house implementation of the MLPerf Tiny benchmark suite (originally implemented in the tf-keras
framework).
To install the latest release:
$ git clone https://github.com/eml-eda/pytorch-benchmarks
$ cd pytorch-benchmarks
$ python setup.py install
Each benchmark is a stand-alone python module based on three python files, namely:
This module must implement all the functions needed to gather the data, eventually pre-process them and finally ship them to the user both in the form of Pytorch Dataset and Pytorch Dataloader.
The two mandatory and standard functions that need to be implemented are:
get_data
, which returns a tuple of Pytorch Datasets. Depending on the task, the number of returned datasets may vary from 2 (train and test) to 3 (train, validation and test). Conversely, the function arguments depends on the specific task.build_dataloaders
, which returns a tuple of Pytorch Dataloaders. In general, takes as inputs the dataset returned byget_data
and constants such as the batch-size and the number of workers. The number of elements of the returned tuple will depends on the number of provided datasets.
This module must implement at least one model for the specific benchmark.
The mandatory and standard function that needs to be implemented is:
get_reference_model
, the function always take as first argument the model_name which is a string associated to a specific pytorch model. Optionally, the function can take as argument model_config i.e., a python dictionary of additional configurations for the model. It returns the requested pytorch model.
If the provided model_name is not supported an error is raised.
This module must implement the minimum set of information required to implement a training loop.
In particular, the mandatory and standard functions that needs to be implemented are:
get_default_optimizer
, it takes as input the pytorch model returned byget_reference_model
and returns the default optimizer for the task.get_default_criterion
, it takes no inputs and returns the default loss function for the task.train_one_epoch
, implements one epoch of training and validation for the benchmark. For the validation part it directly calls theevaluate
function. It takes as input an integer specifying the current epoch, the model to be trained, the criterion, the optimizer, the train and val dataloaders and finally the device to be used for the training. It returns a dictionary of tracked metrics.evaluate
, implement an evaluation step of the model. This step can be both of validation or test depending on the specific dataloader provided as input. It takes as input the model, the criterion, the dataloader and the device. It returns a dictionary of tracked metrics.
Optionally, the benchmark may defines and implements the get_default_scheduler
function which takes as input the optimizer and returns a specified learning-rate scheduler.
The body of this file must import all the standard functions described in data.py
, model.py
and train.py
.
This file is mandatory to identify the parent directory as a python package and to expose to the user the developed functions.
To gain more insights about how this file is structurated and about how the user can develop one on its own, please consult one of the different __init__.py
files already included in the library. E.g., image_classification/__init__.py
.
Finally, for each benchmark an end-to-end example script is provided in the examples
directory:
- Image Classification Example
- Keyword Spotting Example
- Visual Wake Words Example
- Anomaly Detection Example
- Heart Rate Detection Example
- Tiny ImageNet Example
- Gesture Recognition Example
- Image Classification ViT CIFAR10 Example and Image Classification ViT Tiny-ImageNet Example
Each example shows how to use the different functions in order to build a neat and simple DNN training.
If you want to contribute to pytorch-benchmarks
with your code, please follow this steps:
- Create a new directory within
./pytorch_benchmarks
giving a meaningful name to the task. - Follow the format described in API Details.
- Include an end-to-end example script.
- Update this README with the relevant pointers to your new task.
- If you are not a maintainer of the repository, please create a pull-request.