Skip to content

Add f32 N=16 radix-4 codelet for +18% performance improvement#59

Merged
EmNudge merged 1 commit intomainfrom
add-f32-n16-radix4-codelet
Jan 28, 2026
Merged

Add f32 N=16 radix-4 codelet for +18% performance improvement#59
EmNudge merged 1 commit intomainfrom
add-f32-n16-radix4-codelet

Conversation

@EmNudge
Copy link
Owner

@EmNudge EmNudge commented Jan 28, 2026

Adds a radix-4 N=16 codelet for the f32 complex FFT module, improving performance by 18% and closing the gap with f64 from 20% to 5%.

  • Implements radix-4 algorithm (2 stages vs 4 for radix-2)
  • Uses hardcoded twiddle factors for inline computation
  • Adds shared $fft_dispatch for FFT/IFFT consistency
  • Documents Experiment 44 in optimization log

@EmNudge EmNudge force-pushed the add-f32-n16-radix4-codelet branch from 127da9c to 912c412 Compare January 28, 2026 21:38
- Implement radix-4 algorithm (2 stages vs 4 for radix-2)
- Use hardcoded twiddle factors for inline computation
- Add shared $fft_dispatch for FFT/IFFT consistency
- Update README with N=16 benchmark results
- Document Experiment 44 in optimization log
@EmNudge EmNudge force-pushed the add-f32-n16-radix4-codelet branch from 912c412 to ee1c8d7 Compare January 28, 2026 21:39
@github-actions
Copy link

Playground Preview

Deployed to Cloudflare Pages:
https://ce7d7fbe.wat-fft.pages.dev (ee1c8d7)

@EmNudge EmNudge merged commit 32f543d into main Jan 28, 2026
5 checks passed
@EmNudge EmNudge deleted the add-f32-n16-radix4-codelet branch January 28, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant