rocFFT 1.0.12 for ROCm 4.3.0
Changed
Re-split device code into single-precision, double-precision, and miscellaneous kernels.
Fixed
- Fixed potential crashes in double-precision planar->planar transpose.
Added
- Added new kernel generator for select lengths. New kernels have
improved performance. - Added public
rocfft_execution_info_set_load_callback
and
rocfft_execution_info_set_store_callback
API functions to allow
executing extra logic when loading/storing data from/to global
memory during a transform.
Removed
- Removed R2C pair schemes and kernels.
Optimizations
- Optimized 2D/3D R2C 100 and 1D Z2Z 2500.
- Reduced number of kernels for 2D/3D sizes where higher dimension is 64, 128, 256.
Fixed
- Fixed potential crashes in 3D transforms with unusual strides, for
SBCC-optimized sizes.