Skip to content

eml-eda/pytorch-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytorch Benchmarks

pytorch-benchmarks is the benchmark library of the eml branch of the EDA group within Politecnico di Torino.

The library is entirely written in pytorch and addresses the training of DNNs models on edge-relevant use-cases.

In its latest release, the library currently includes the following benchmarks:

  1. Image Classification on the CIFAR10 dataset.
  2. Keyword Spotting on the Google Speech Commands v2 dataset.
  3. Visual Wake Words on the MSCOCO dataset.
  4. Anomaly Detection on the ToyADMOS dataset.
  5. Heart Rate Detection on the PPG-DALIA dataset.
  6. TinyImageNet on the omonimous dataset.
  7. Gesture Recognition on the NinaProDB6 dataset.
  8. Image Classification - ViT for Vision Transformers on the CIFAR10 and Tiny-ImageNet datasets.
  9. InfraRed Person Counting on the LINAIGE dataset.

N.B., tasks from 1. to 4. represent our in-house implementation of the MLPerf Tiny benchmark suite (originally implemented in the tf-keras framework).

Installation

To install the latest release:

$ git clone https://github.com/eml-eda/pytorch-benchmarks
$ cd pytorch-benchmarks
$ python setup.py install

API Details

Each benchmark is a stand-alone python module based on three python files, namely:

  1. data.py
  2. model.py
  3. train.py
  4. __init__.py

data.py

This module must implement all the functions needed to gather the data, eventually pre-process them and finally ship them to the user both in the form of Pytorch Dataset and Pytorch Dataloader.

The two mandatory and standard functions that need to be implemented are:

  • get_data, which returns a tuple of Pytorch Datasets. Depending on the task, the number of returned datasets may vary from 2 (train and test) to 3 (train, validation and test). Conversely, the function arguments depends on the specific task.
  • build_dataloaders, which returns a tuple of Pytorch Dataloaders. In general, takes as inputs the dataset returned by get_data and constants such as the batch-size and the number of workers. The number of elements of the returned tuple will depends on the number of provided datasets.

model.py

This module must implement at least one model for the specific benchmark.

The mandatory and standard function that needs to be implemented is:

  • get_reference_model, the function always take as first argument the model_name which is a string associated to a specific pytorch model. Optionally, the function can take as argument model_config i.e., a python dictionary of additional configurations for the model. It returns the requested pytorch model.

If the provided model_name is not supported an error is raised.

train.py

This module must implement the minimum set of information required to implement a training loop.

In particular, the mandatory and standard functions that needs to be implemented are:

  • get_default_optimizer, it takes as input the pytorch model returned by get_reference_model and returns the default optimizer for the task.
  • get_default_criterion, it takes no inputs and returns the default loss function for the task.
  • train_one_epoch, implements one epoch of training and validation for the benchmark. For the validation part it directly calls the evaluate function. It takes as input an integer specifying the current epoch, the model to be trained, the criterion, the optimizer, the train and val dataloaders and finally the device to be used for the training. It returns a dictionary of tracked metrics.
  • evaluate, implement an evaluation step of the model. This step can be both of validation or test depending on the specific dataloader provided as input. It takes as input the model, the criterion, the dataloader and the device. It returns a dictionary of tracked metrics.

Optionally, the benchmark may defines and implements the get_default_scheduler function which takes as input the optimizer and returns a specified learning-rate scheduler.

__init__.py

The body of this file must import all the standard functions described in data.py, model.py and train.py. This file is mandatory to identify the parent directory as a python package and to expose to the user the developed functions.

To gain more insights about how this file is structurated and about how the user can develop one on its own, please consult one of the different __init__.py files already included in the library. E.g., image_classification/__init__.py.

Example Scripts

Finally, for each benchmark an end-to-end example script is provided in the examples directory:

  1. Image Classification Example
  2. Keyword Spotting Example
  3. Visual Wake Words Example
  4. Anomaly Detection Example
  5. Heart Rate Detection Example
  6. Tiny ImageNet Example
  7. Gesture Recognition Example
  8. Image Classification ViT CIFAR10 Example and Image Classification ViT Tiny-ImageNet Example

Each example shows how to use the different functions in order to build a neat and simple DNN training.

Contribution guidelines

If you want to contribute to pytorch-benchmarks with your code, please follow this steps:

  1. Create a new directory within ./pytorch_benchmarks giving a meaningful name to the task.
  2. Follow the format described in API Details.
  3. Include an end-to-end example script.
  4. Update this README with the relevant pointers to your new task.
  5. If you are not a maintainer of the repository, please create a pull-request.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages