take fast path if c2c transform does not need padding or trimming by chillenb · Pull Request #283 · IntelPython/mkl_fft

chillenb · 2026-03-03T02:36:53Z

Thanks for creating and maintaining this package!

If you try to get MKL C-API performance out of this package, you will probably discover that fftn is very sensitive to the input arguments. Here is an example:

In [1]: import numpy as np
   ...: import mkl_fft
   ...: N = 200
   ...: A = np.random.random((1,N,N,N)).astype(np.complex128)
 
In [2]: %timeit mkl_fft.fftn(A, s=A.shape[1:], axes=(1,2,3))
164 ms ± 187 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit mkl_fft.fftn(A, axes=(1,2,3))
6.56 ms ± 304 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit mkl_fft.interfaces.numpy_fft.fftn(A, axes=(1,2,3))
165 ms ± 31.3 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This is because mkl_fft.fftn always takes a slow path (_iter_fftnd) when s != None. Furthermore, the NumPy and SciPy interfaces don't pass through s=None unchanged, so they are also forced to take this path.
This pull request allows fftn to detect when the input s argument is equivalent to s=None so it can use the faster function _iter_complementary.

After these code changes, performance aligns better with expectations:

In [1]: import numpy as np
   ...: import mkl_fft
   ...: N = 200
   ...: A = np.random.random((1,N,N,N)).astype(np.complex128)

In [2]: %timeit mkl_fft.interfaces.numpy_fft.fftn(A, axes=(1,2,3))
8.28 ms ± 551 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [3]: %timeit mkl_fft.fftn(A, s=A.shape[1:], axes=(1,2,3))
The slowest run took 4.49 times longer than the fastest. This could mean that an intermediate result is being cached.
9.92 ms ± 7.49 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit mkl_fft.fftn(A, axes=(1,2,3))
6.49 ms ± 60.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Test system: dual-socket Xeon Platinum 8268 server.

intel-python-devops · 2026-03-03T02:40:07Z

Can one of the admins verify this patch?

chillenb · 2026-03-03T03:21:29Z

Oops, I didn't realize that the CI would have to be approved again after fixing the whitespace. Sorry!

The tests previously approved did run and pass, though.

ndgrigorian · 2026-03-03T04:52:39Z

Oops, I didn't realize that the CI would have to be approved again after fixing the whitespace. Sorry!

The tests previously approved did run and pass, though.

It's no problem, thanks for this contribution to the project. :)

I'm not sure if any of our tests currently cover this case and compare with e.g. numpy, so adding a test would be good too.

antonwolfy

Could you please populate the changelog also.

mkl_fft/tests/test_fftnd.py

mkl_fft/_fft_utils.py

chillenb · 2026-03-06T17:03:12Z

I just rebased off the master branch to get rid of the merge conflict, so the conversation looks a bit garbled--apologies.

chillenb requested review from antonwolfy, jharlow-intel, ndgrigorian and xaleryb as code owners March 3, 2026 02:36

antonwolfy added this to the 2.2.0 release milestone Mar 6, 2026

antonwolfy reviewed Mar 6, 2026

View reviewed changes

mkl_fft/tests/test_fftnd.py Show resolved Hide resolved

mkl_fft/_fft_utils.py Outdated Show resolved Hide resolved

mkl_fft/_fft_utils.py Outdated Show resolved Hide resolved

chillenb added a commit to chillenb/mkl_fft that referenced this pull request Mar 6, 2026

add IntelPythongh-283 to changelog

d6da5fb

chillenb added 5 commits March 6, 2026 11:48

take fast path if c2c transform does not need padding or trimming

68ae214

Satisfy linter

1c6c6f8

add test for s=None vs equivalent s

c56754e

make sure test_s_none_vs_s_full actually uses iter_complementary

2b05835

rename s_equiv_to_none and consolidate shape-checking logic

98b1964

chillenb added a commit to chillenb/mkl_fft that referenced this pull request Mar 6, 2026

add IntelPythongh-283 to changelog

3e72b97

chillenb force-pushed the faster branch from 038764b to 3e72b97 Compare March 6, 2026 16:51

add this to changelog

4a66ad5

chillenb force-pushed the faster branch from 3e72b97 to 4a66ad5 Compare March 6, 2026 16:53

chillenb requested a review from antonwolfy March 6, 2026 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

take fast path if c2c transform does not need padding or trimming#283

take fast path if c2c transform does not need padding or trimming#283
chillenb wants to merge 6 commits intoIntelPython:masterfrom
chillenb:faster

chillenb commented Mar 3, 2026

Uh oh!

intel-python-devops commented Mar 3, 2026

Uh oh!

chillenb commented Mar 3, 2026 •

edited

Loading

Uh oh!

ndgrigorian commented Mar 3, 2026

Uh oh!

antonwolfy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chillenb commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

chillenb commented Mar 3, 2026

Uh oh!

intel-python-devops commented Mar 3, 2026

Uh oh!

chillenb commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ndgrigorian commented Mar 3, 2026

Uh oh!

antonwolfy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chillenb commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chillenb commented Mar 3, 2026 •

edited

Loading