[QST] Question about UniversalFMA #2101

leven-comeon · 2025-02-12T13:33:45Z

What is your question?

When executing the sgemm_70.cu program, I used the print_latex tool to observe the structure of mmaC, and its specific details are shown in the figure.

As far as I know, UniversalFMA uses CUDA Cores for computation. However, during the execution of the General Matrix Multiplication (GEMM) operation, a puzzling phenomenon occurred: it seems that a thread can access the values in the internal registers of other threads. For example, thread T17 can read the values in the registers of threads T1 and T16 to complete the computation.
I'm wondering if you could kindly explain to me the reason behind this behavior?

The text was updated successfully, but these errors were encountered:

thakkarV · 2025-02-12T14:09:29Z

The A and V TV mappings only show the first thread. Multiple threads are each reading A and B in reality

leven-comeon added ? - Needs Triage question Question labels Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Question about UniversalFMA #2101

[QST] Question about UniversalFMA #2101

leven-comeon commented Feb 12, 2025

thakkarV commented Feb 12, 2025

[QST] Question about UniversalFMA #2101

[QST] Question about UniversalFMA #2101

Comments

leven-comeon commented Feb 12, 2025

thakkarV commented Feb 12, 2025