FusionLayerNormSharedMemoryBuffer_CUDA validation #3777

naoyam · 2025-01-28T22:51:51Z

https://github.com/NVIDIA/Fuser/blob/main/tests/cpp/test_gpu3.cpp#L7371

FusionLayerNormSharedMemoryBuffer_CUDA seems to be trying to check if the inner persistent scheduler is used, but just calling computeHeuristics doesn't seem to guarantee FusionExecutorCache indeed uses the scheduler but it seems it just computes the heuristics parameters assuming the fusion passes the canSchedule functions. Indeed, running the tests seems to indicate the fusion is segmented in some sizes.

@liqiangxl What is this part meant to check?

The text was updated successfully, but these errors were encountered:

liqiangxl · 2025-01-29T14:15:56Z

That part is used to check whether register or shared memory is used to store persistent buffers.
You are right, some cases may segment when the hidden size is very large and exceeds the avialable shared memory.

naoyam · 2025-01-29T17:34:34Z

Please update the test so that its intended behavior is indeed verified. The result through FusionExecutorCache is what actually matters, and that's not validated.

liqiangxl mentioned this issue Jan 29, 2025

fix error when calculating smem overhead #3790

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FusionLayerNormSharedMemoryBuffer_CUDA validation #3777

FusionLayerNormSharedMemoryBuffer_CUDA validation #3777

naoyam commented Jan 28, 2025

liqiangxl commented Jan 29, 2025

naoyam commented Jan 29, 2025

FusionLayerNormSharedMemoryBuffer_CUDA validation #3777

FusionLayerNormSharedMemoryBuffer_CUDA validation #3777

Comments

naoyam commented Jan 28, 2025

liqiangxl commented Jan 29, 2025

naoyam commented Jan 29, 2025