You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FusionLayerNormSharedMemoryBuffer_CUDA seems to be trying to check if the inner persistent scheduler is used, but just calling computeHeuristics doesn't seem to guarantee FusionExecutorCache indeed uses the scheduler but it seems it just computes the heuristics parameters assuming the fusion passes the canSchedule functions. Indeed, running the tests seems to indicate the fusion is segmented in some sizes.
That part is used to check whether register or shared memory is used to store persistent buffers.
You are right, some cases may segment when the hidden size is very large and exceeds the avialable shared memory.
Please update the test so that its intended behavior is indeed verified. The result through FusionExecutorCache is what actually matters, and that's not validated.
https://github.com/NVIDIA/Fuser/blob/main/tests/cpp/test_gpu3.cpp#L7371
FusionLayerNormSharedMemoryBuffer_CUDA
seems to be trying to check if the inner persistent scheduler is used, but just callingcomputeHeuristics
doesn't seem to guaranteeFusionExecutorCache
indeed uses the scheduler but it seems it just computes the heuristics parameters assuming the fusion passes thecanSchedule
functions. Indeed, running the tests seems to indicate the fusion is segmented in some sizes.@liqiangxl What is this part meant to check?
The text was updated successfully, but these errors were encountered: