You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a couple of clarifying questions about cellular attention networks and the code for the attention mechanism in the conv and message passing files.
I may be mistaken, but it seems like there is no normalization being done for the attention coefficients in the current implementation of attention (the reference paper uses softmax for this purpose). Should we leave the current attention mechanism as is or is it worth rewriting the code to implement the normalization?
With regards to the tensor diagram for CAN's when it comes to the neighborhood aggregation the tensor diagram here calls for applying a non-linearity to each within neighborhood aggregation and then performing the inter-neighborhood aggregation, whereas the referenced CAN paper performs the inter-neighborhood aggregation and then applies the non-linearity. Would it be ok to go with the formula given by the paper? This would also make it so our implementation can reduce to the Hodge Laplacian layer in the referenced Rodenberry et al. paper when the option to use attention is set to false.
The text was updated successfully, but these errors were encountered:
I have a couple of clarifying questions about cellular attention networks and the code for the attention mechanism in the conv and message passing files.
I may be mistaken, but it seems like there is no normalization being done for the attention coefficients in the current implementation of attention (the reference paper uses softmax for this purpose). Should we leave the current attention mechanism as is or is it worth rewriting the code to implement the normalization?
With regards to the tensor diagram for CAN's when it comes to the neighborhood aggregation the tensor diagram here calls for applying a non-linearity to each within neighborhood aggregation and then performing the inter-neighborhood aggregation, whereas the referenced CAN paper performs the inter-neighborhood aggregation and then applies the non-linearity. Would it be ok to go with the formula given by the paper? This would also make it so our implementation can reduce to the Hodge Laplacian layer in the referenced Rodenberry et al. paper when the option to use attention is set to false.
(1) I would suggest you normalize as done by the original.
(2) Also the same, please go by what the original paper suggests. We will update the tensor diagram accordingly later. @mathildepapillon
I have a couple of clarifying questions about cellular attention networks and the code for the attention mechanism in the conv and message passing files.
The text was updated successfully, but these errors were encountered: