Skip to content

Better small size kernels #75

@Shnatsel

Description

@Shnatsel

RustFFT can do a size-32 FFT entirely in registers on AVX2 via a dedicated compute kernel. size-64 is not entirely in registers but still highly optimized.

Meanwhile we use the generic structure with lots of subsequent radix-2 passes over the input, so we have to do lots of loads and stores between the actual math.

I don't want to have dedicated handwritten kernels for small sizes. What I'd like to do is replace the current dedicated code for small-size passes with one or two kernels (maybe size 32 or 64 or both) that larger FFTs can be reduced to (plus generic size-independent passes on top). This means we can't simply crib the RustFFT kernels because they do mixed-radix radix-4-2 with integrated bit reversal; we want radix-2 (likely with optimizations from #74) and general bit reversal so this kernel can be reused as a component of processing larger sizes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions