You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When executing the sgemm_70.cu program, I used the print_latex tool to observe the structure of mmaC, and its specific details are shown in the figure.
As far as I know, UniversalFMA uses CUDA Cores for computation. However, during the execution of the General Matrix Multiplication (GEMM) operation, a puzzling phenomenon occurred: it seems that a thread can access the values in the internal registers of other threads. For example, thread T17 can read the values in the registers of threads T1 and T16 to complete the computation.
I'm wondering if you could kindly explain to me the reason behind this behavior?
The text was updated successfully, but these errors were encountered:
What is your question?
When executing the
sgemm_70.cu
program, I used theprint_latex
tool to observe the structure ofmmaC
, and its specific details are shown in the figure.As far as I know,
UniversalFMA
uses CUDA Cores for computation. However, during the execution of the General Matrix Multiplication (GEMM) operation, a puzzling phenomenon occurred: it seems that a thread can access the values in the internal registers of other threads. For example, threadT17
can read the values in the registers of threadsT1
andT16
to complete the computation.I'm wondering if you could kindly explain to me the reason behind this behavior?
The text was updated successfully, but these errors were encountered: