-
Notifications
You must be signed in to change notification settings - Fork 8
CUDA Backend Acceleration #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Further test (dev branch):
|
This reverts commit eac211b.
Thank you for your hard work on this! We’ve conducted efficiency tests, and based on the results, we’re planning to merge your code once all the remaining ToDo items are completed—this should help significantly boost the overall speed. |
Appreciate your positive feedback. I believe this project is a foundational and extremely meaningful piece of work, and I feel very honored to have the opportunity to contribute. I'm currently quite busy with my 26Fall PhD applications, so my free time is limited. However, I will get to work on fixing the numerical issues as soon as possible. As I've found, the numerical errors appear when |
Wish you find a good PhD position and wait for your wonderful CUDA Optimization! |
I recommend temporarily merging the current version. |
The current branch may contain too many unnecessary commits. |
splits→permute→view→mean
toblock mean CUDA
kernel