Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FusionLayerNormSharedMemoryBuffer_CUDA validation #3777

Open
naoyam opened this issue Jan 28, 2025 · 2 comments
Open

FusionLayerNormSharedMemoryBuffer_CUDA validation #3777

naoyam opened this issue Jan 28, 2025 · 2 comments

Comments

@naoyam
Copy link
Collaborator

naoyam commented Jan 28, 2025

https://github.com/NVIDIA/Fuser/blob/main/tests/cpp/test_gpu3.cpp#L7371

FusionLayerNormSharedMemoryBuffer_CUDA seems to be trying to check if the inner persistent scheduler is used, but just calling computeHeuristics doesn't seem to guarantee FusionExecutorCache indeed uses the scheduler but it seems it just computes the heuristics parameters assuming the fusion passes the canSchedule functions. Indeed, running the tests seems to indicate the fusion is segmented in some sizes.

@liqiangxl What is this part meant to check?

@liqiangxl
Copy link
Collaborator

That part is used to check whether register or shared memory is used to store persistent buffers.
You are right, some cases may segment when the hidden size is very large and exceeds the avialable shared memory.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 29, 2025

Please update the test so that its intended behavior is indeed verified. The result through FusionExecutorCache is what actually matters, and that's not validated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants