Skip to content
/ MwT Public

Modularizing while Training: A New Paradigm for Modularizing DNN Models (ICSE'24)

Notifications You must be signed in to change notification settings

qibinhang/MwT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Modularizing while Training: A New Paradigm for Modularizing DNN Models

Abstract

Deep neural network (DNN) models have become increasingly crucial components in intelligent software systems. However, training a DNN model is typically expensive in terms of both time and money. To address this issue, researchers have recently focused on reusing existing DNN models - borrowing the idea of code reuse in software engineering.However, reusing an entire model could cause extra overhead or inherits the weakness from the undesired functionalities.Hence, existing work proposes to decompose an already trained model into modules, i.e., modularizing-after-training, and enable module reuse.Since trained models are not built for modularization, modularizing-after-training incurs huge overhead and model accuracy loss.In this paper, we propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT).We train a model to be structurally modular through two loss functions that optimize intra-module cohesion and inter-module coupling. We have implemented the proposed approach for modularizing Convolutional Neural Network (CNN) models in this work.The evaluation results on representative models demonstrate that MwT outperforms the state-of-the-art approach. Specifically, the accuracy loss caused by MwT is only 1.13%, which is 1.76% less than that of the latter. The kernel retention rate of the modules generated by MwT is only 14.58%, with a reduction of 74.31% over the state-of-the-art approach.Furthermore, the total time cost required for training and modularizing is only 108 minutes, half that of the latter.

Requirements

  • fvcore 0.1.5.post20221221
  • numpy 1.23.1
  • python 3.9.12
  • pytorch 1.12.0
  • tensorboard 2.10.1
  • torchvision 0.13.0
  • tqdm 4.64.0
  • GPU with CUDA support is also needed

Structure of the directories

  |--- README.md                        :  the user guidance
  |--- data/                            :  the experimental data
  |--- src/                             :  the source code of our work
       |--- configs.py                  :  setting the path
       |--- modular_trainer.py          :  training modular CNN models
       |--- modularizer.py              :  modularizing trained modular CNN models and then reusing modules on sub-tasks
       |--- standard_trainer.py         :  training CNN models using the standard training method 
       |--- ...
       |--- models/                    
            |--- utils_v2.py            :  the implementation of mask generator 
            |--- vgg.py                 :  the standard vgg16 model
            |--- vgg_masked.py          :  the modular vgg16 model, i.e., the standard vgg16 model with mask generators
            |--- ...
       |--- modules_arch/
            |--- vgg_module_v2.py       :  the vgg16 module which retains only relevant kernels and removes mask generators.
            |--- ...
       |--- exp_cnnsplitter_reusing/
            |--- reuse_modules.py       :  reusing modules published by CNNSplitter on sub-tasks
            |--- calculate_cohesion.py  :  calculating the cohesion of modules
            |--- ...                    :  published by CNNSplitter
       |--- ...

Replication of experimental results

Downloading experimental data

The following sections describe how to reproduce the experimental results in our paper.

  1. We provide the resulting models trained by standard training and modular models trained by modular training
    One can download data/ from here and then move it to MwT/.
    The datasets will be downloaded automatically by PyTorch when running our project.
  2. Modify self.root_dir in src/configs.py.

Modular training, modularizing, and module reuse

  1. Training a modular VGG16 model.
python modular_trainer.py --model vgg16 --dataset cifar10 --lr_model 0.05 --alpha 0.5 --beta 1.5 --batch_size 128
  1. Modularizing the modular VGG16 model and reusing the resulting modules on a sub-task containing "class 0" and "class 1".
python modularizer.py --model vgg16 --dataset cifar10 --lr_model 0.05 --alpha 0.5 --beta 1.5 --batch_size 128 --target_classes 0 1

Standard training

  1. Training a VGG16 model
python standard_trainer.py --model vgg16 --dataset cifar10 --lr_model 0.05 --batch_size 128

Reusing modules from CNNSplitter

  1. Downloading the published modules at CNNSplitter's project webpage.
  2. Modifying root_dir in src/exp_cnnsplitter_reusing/global_configure.py
  3. Modifying dataset_dir in src/exp_cnnsplitter_reusing/reuse_modules.py
  4. Reusing SimCNN-CIFAR10's modules on a sub-task containing "class 0" and "class 1"
python reuse_modules.py --model simcnn --dataset cifar10 --target_classes 0 1
  1. Calculating the cohesion of modules
python calculate_cohesion.py --model simcnn --dataset cifar10

Supplementary experimental results

Discussion of the effect of threshold on modularizing the modular ResNet18-CIFAR10 model.

The value of threshold directly affects the results of modularizing and module reuse. As shown in the figure below, as the threshold increases from 0.1 to 0.9, the kernel retention rate of the modules gradually decreases, from 37.36% to 24.74%. A larger threshold makes each module tend to retain convolutional kernels that are required by all samples of the corresponding category, leading to an increase in cohesion from 0.8572 to 0.9437 and a decrease in coupling from 0.3594 to 0.2412.

img

Discussion of the effect of threshold on reusing the ResNet18-CIFAR10's modules.

Regarding the effect on module reuse, the figure below presents the performance of the modules in terms of kernel retention rate and accuracy on the 3-class classification sub-task. As the threshold increases, the KRR of the module decreases, from 72.57% to 50.51%. Nonetheless, the decrease of KRR has a negligible impact on the accuracy of the module, which only drops from 97.77% to 97.23%. The experimental results also demonstrate that our default settings are appropriate.

img

About

Modularizing while Training: A New Paradigm for Modularizing DNN Models (ICSE'24)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages