Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print Trainable as a column #54

Open
joonas-yoon opened this issue May 29, 2022 · 7 comments
Open

Print Trainable as a column #54

joonas-yoon opened this issue May 29, 2022 · 7 comments
Assignees

Comments

@joonas-yoon
Copy link
Contributor

joonas-yoon commented May 29, 2022

🚀 Feature

New column in summary, Trainable determines whether gradients need to be computed.

We can know this from model's parameters easily:

for p in model.parameters():
    print(p.requires_grad)

In short, expected feature is:

_________________________________________________________________________________________________________
Layer                        Type                  Output Shape              Param #          Trainable
=========================================================================================================
vgg                          VGG                   (-1, 1000)                0                
├─features                   Sequential            (-1, 512, 7, 7)           0                
|    └─0                     Conv2d                (-1, 64, 224, 224)        1,792            True
|    └─1                     ReLU                  (-1, 64, 224, 224)        0                -
|    └─2                     Conv2d                (-1, 64, 224, 224)        36,928           True
|    └─3                     ReLU                  (-1, 64, 224, 224)        0                -
|    └─4                     MaxPool2d             (-1, 64, 112, 112)        0                
|    └─5                     Conv2d                (-1, 128, 112, 112)       73,856           True
|    └─6                     ReLU                  (-1, 128, 112, 112)       0                -
...
├─classifier                 Sequential            (-1, 1000)                0                
|    └─0                     Linear                (-1, 4096)                102,764,544      False
|    └─1                     ReLU                  (-1, 4096)                0                -
|    └─2                     Dropout               (-1, 4096)                0                -
|    └─3                     Linear                (-1, 4096)                16,781,312       False
|    └─4                     ReLU                  (-1, 4096)                0                -
|    └─5                     Dropout               (-1, 4096)                0                -
|    └─6                     Linear                (-1, 1000)                4,097,000        False

Motivation & pitch

I have been trying transfering model with DenseNet, and got summary.

model = torchvision.models.densenet201(pretrained=True)
model.classifier = nn.Sequential(
    nn.Linear(1920, 10)
)
for p in model.classifier.parameters():
    p.requires_grad = False
summary(model, (3, 224, 224))

but there is no information which layer is trainable. this is the tail of result.

|    |    |    └─conv2       Conv2d                (-1, 32, 7, 7)            36,864         
|    └─norm5                 BatchNorm2d           (-1, 1920, 7, 7)          7,681          
├─classifier                 Sequential            (-1, 10)                  0              
|    └─0                     Linear                (-1, 10)                  19,210         
==========================================================================================
Trainable params: 18,092,928
Non-trainable params: 19,210
Total params: 18,112,138

Alternatives

No response

Additional context

I will wait for your response. I want to hear what you think about this.

@joonas-yoon joonas-yoon added the type: improvement New feature or request label May 29, 2022
@frgfm
Copy link
Owner

frgfm commented May 29, 2022

Hi @joonas-yoon 👋

This is an interesting feature idea! Here is some feedback:

  • there can be mutliple params in one layer
  • for torchscan to check backprop RAM consumption, we need the parameters to require the grad

What do you think?

@frgfm frgfm added module: crawler Related to crawler awaiting response type: feat and removed type: improvement New feature or request labels May 29, 2022
@joonas-yoon
Copy link
Contributor Author

Could you give me one example for multiple parameters? I have no idea about it but interesting.

@joonas-yoon
Copy link
Contributor Author

and for second thing, RAM consumption, how about save all of its state and restore them?

obviously, it have to take more time and less performance. any idea?

@frgfm
Copy link
Owner

frgfm commented May 31, 2022

Hey there 👋

Well, for multiple parameters, almost all layers 😅

from torch import nn

# Create a fully connected layer
layer = nn.Linear(4, 8)
# Don't track grad on the weights
layer.weight.requires_grad_(False)

# But the bias is still loose
for n, p in layer.named_parameters():
    print(n, p.requires_grad)

which yields:

weight False
bias True

For the second part, I had the same in mind, I agree 👍

@joonas-yoon
Copy link
Contributor Author

joonas-yoon commented May 31, 2022

Oh I see. then, only for having different one, how about this?

_______________________________________________________________________________________________________________
Layer                        Type                  Output Shape              Param #          Trainable
===============================================================================================================
vgg                          VGG                   (-1, 1000)                0                
...
|    └─3                     Linear                (-1, 4096)                16,781,312       False
|    └─4                     ReLU                  (-1, 4096)                0                -
|    └─5                     Dropout               (-1, 4096)                0                -
|    └─6                     Linear                (-1, 1000)                4,097,000        weight: False
|                                                                                             bias: True

no matter it has multiple lines. it's okay with single line as like weight: False, bias: True. but it prints too long string 🤔

@frgfm
Copy link
Owner

frgfm commented May 31, 2022

Well, that will become hairy, I honestly don't want to spread on multiple lines.
The only suggestion I can see is:

  • write True if any parameter is trainable
  • write False otherwise

@joonas-yoon
Copy link
Contributor Author

good, I totally agree with you.

one thing I want to suggest is, it needs to be noticed from documentation. for example, "False; contains partial mixed-trainable parameters"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants