-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hello, thank you for your great work.
In your code and your paper, you mentioned that spatial (summing width and height separately) shows better result, ensuring plasticity. However, summing in width way makes 1.0 in all vector since we apply softmax function at attention map. Thus, calculating difference between the two different map w.r.t to width way would only make zero value.
Therefore, I'm curious why you are using spatial method.
It seems that your code uses "height" configuration, and I think you are aware of it. Does I have misunderstanding of your paper?
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels