CUDA Backend Acceleration #4

Yiozolm · 2025-09-01T03:49:35Z

New Features
1. CUDA backend
2. Add FB/F2B cache
3. Broadcast replace repeat
4. splits→permute→view→mean to block mean CUDA kernel
5. STy s-fold upsampler CUDA kernel
6. R2C/C2R (Real FFT) replaces C2C
Tests
1. Speed test for fwd/bwd
2. Precision comprison between pytorch and CUDA

Yiozolm · 2025-09-01T12:18:24Z

Further test (dev branch):

v7 may cause numerical issues during training; we recommend using it only during inference.
On devices with limited processing power, such as the 2080ti, v1 appears to be the optimal choice.

This reverts commit eac211b.

csgeekhuang · 2025-09-11T13:04:36Z

Thank you for your hard work on this! We’ve conducted efficiency tests, and based on the results, we’re planning to merge your code once all the remaining ToDo items are completed—this should help significantly boost the overall speed.

Yiozolm · 2025-09-12T01:06:26Z

Thank you for your hard work on this! We’ve conducted efficiency tests, and based on the results, we’re planning to merge your code once all the remaining ToDo items are completed—this should help significantly boost the overall speed.

Appreciate your positive feedback. I believe this project is a foundational and extremely meaningful piece of work, and I feel very honored to have the opportunity to contribute.

I'm currently quite busy with my 26Fall PhD applications, so my free time is limited. However, I will get to work on fixing the numerical issues as soon as possible. As I've found, the numerical errors appear when scale≥2, but all versions seem to be correct when scale=1. I will prioritize addressing this to ensure the code's accuracy.

csgeekhuang · 2025-09-12T11:54:40Z

Thank you for your hard work on this! We’ve conducted efficiency tests, and based on the results, we’re planning to merge your code once all the remaining ToDo items are completed—this should help significantly boost the overall speed.

Appreciate your positive feedback. I believe this project is a foundational and extremely meaningful piece of work, and I feel very honored to have the opportunity to contribute.

I'm currently quite busy with my 26Fall PhD applications, so my free time is limited. However, I will get to work on fixing the numerical issues as soon as possible. As I've found, the numerical errors appear when scale≥2, but all versions seem to be correct when scale=1. I will prioritize addressing this to ensure the code's accuracy.

Wish you find a good PhD position and wait for your wonderful CUDA Optimization!

Yiozolm · 2025-09-14T08:53:17Z

I recommend temporarily merging the current version.
Any further optimizations appear to introduce floating-point reordering errors, which is also frustrating me.

Yiozolm · 2025-09-14T08:58:16Z

The current branch may contain too many unnecessary commits.
Maybe you can use Squash merge.

Yiozolm and others added 19 commits August 15, 2025 16:33

add naive cuda implement

2760962

add naive backward

bfa771c

Update CUDA kenrel

6cdc522

Update installation method

177131d

add cache

34311b2

fix bug in backend choice

e1c4d7d

add Inner cuda kernel

19f401b

Modify STy

12a86e6

add Speedtest results

025c867

Update project structure

54b9b99

Add TODO list

aae6bb2

Update test results saving

0ff2849

Delete test/results.csv

100223d

Update kernel & Add fwd test

207e214

Add Optimization Details

ab26c28

Add profiler scripts

2e46664

Format main branch

45f5d59

Update log

ddf147d

maintain the most stable original CUDA backend

9a71245

Yiozolm added 2 commits September 4, 2025 09:12

Fix v7 precision error

eac211b

Revert "Fix v7 precision error"

8de81bf

This reverts commit eac211b.

Yiozolm closed this Sep 4, 2025

Stable C++ only version

c7eb880

Yiozolm reopened this Sep 14, 2025

Yiozolm closed this Sep 14, 2025

Yiozolm reopened this Sep 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA Backend Acceleration #4

CUDA Backend Acceleration #4

Uh oh!

Yiozolm commented Sep 1, 2025

Uh oh!

Yiozolm commented Sep 1, 2025

Uh oh!

csgeekhuang commented Sep 11, 2025

Uh oh!

Yiozolm commented Sep 12, 2025 •

edited

Loading

Uh oh!

csgeekhuang commented Sep 12, 2025

Uh oh!

Yiozolm commented Sep 14, 2025

Uh oh!

Yiozolm commented Sep 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUDA Backend Acceleration #4

Are you sure you want to change the base?

CUDA Backend Acceleration #4

Uh oh!

Conversation

Yiozolm commented Sep 1, 2025

Uh oh!

Yiozolm commented Sep 1, 2025

Uh oh!

csgeekhuang commented Sep 11, 2025

Uh oh!

Yiozolm commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

csgeekhuang commented Sep 12, 2025

Uh oh!

Yiozolm commented Sep 14, 2025

Uh oh!

Yiozolm commented Sep 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yiozolm commented Sep 12, 2025 •

edited

Loading