Repack to half-length fft_small convolution for tiny moduli in _nmod_poly_mul and mullow #2478

fredrik-johansson · 2025-11-06T15:56:05Z

When the unreduced coefficients of an _nmod_poly product are small enough to fit in 50 bits, fft_small multiplies using a single FFT prime.

We can do even better when the moduli are really small by packing a linear polynomial into each coefficient. If the coefficients of the product (over $\mathbb{Z}$) are smaller than $M$, we can repack $a_0 + a_1 x + \ldots$ as $(a_0 + a_1 M) + (a_2 + a_3 M) x + \ldots$. The coefficients of the product of such polynomials will be quadratic polynomials in $M$, from which we can read off the coefficients of the original product.

It is clear that we can take $M = 2^{16}$ since $M^3$ is smaller than the 50-bit FFT primes used by fft_small. With a more careful analysis we can show that it is usually possible to work with $M = 2^{17}$.

Where this trick is applicable, currently for moduli $1 \le m \le 23$, it gives up to a 2x speedup by halving the convolution length. Plots demonstrating the speedup for _nmod_poly_mul and _nmod_poly_mullow are attached below.

Input lengths where this trick is applicable are roughly up to 250000 for $m = 2$, 77000 for $m = 3$, 21000 for $m = 5$, 9800 for $m = 7$, 3600 for $m = 11$, 1400 for $m = 17$ and 700 for $m = 23$.

This is definitely a hack: an optimized 32-bit FFT/NTT (or maybe something Schonhage-Strassen-like with many coefficients bit-packed in each word) should perform even better, and would allow much larger moduli and longer products. But this hack is easy to implement with the tools we have in FLINT right now, so we may as well use it.

Example improvement on a nontrivial benchmark problem: constructing GF($5^{3125}$) previously took 5.25 seconds, takes 3.97 seconds with this PR (1.32x speedup).

BTW, a new (to FLINT) trick used in the unpacking code is the 32-bit precomped remainder algorithm by Lemire, Kaser & Kurz which could be useful elsewhere in the nmod modules. This is even faster than the code generated by GCC for remainder by a compile-time constant.

…poly_mul and _nmod_poly_mullow

src/nmod_poly/mullow_fft_small.c

albinahlback · 2025-11-06T16:24:08Z

src/nmod_poly/mullow_fft_small.c

+static const short fft_mul_tab[] = {1326, 1326, 1095, 802, 674, 537, 330, 306, 290,
+274, 200, 192, 182, 173, 163, 99, 97, 93, 90, 82, 80, 438, 414, 324, 393,
+298, 298, 268, 187, 185, 176, 176, 168, 167, 158, 158, 97, 96, 93, 92, 89,
+89, 85, 85, 80, 81, 177, 172, 163, 162, 164, 176, 171, 167, 167, 164, 163,
+163, 160, 165, 95, 96, 90, 94, };
+
+static const short fft_sqr_tab[] = {1420, 1420, 1353, 964, 689, 569, 407, 353, 321,
+321, 292, 279, 200, 182, 182, 159, 159, 152, 145, 139, 723, 626, 626, 569,
+597, 448, 542, 292, 292, 200, 191, 191, 182, 182, 166, 166, 166, 159, 159,
+159, 152, 152, 145, 145, 93, 200, 191, 182, 182, 182, 182, 191, 191, 191,
+182, 182, 174, 182, 182, 182, 152, 152, 152, 145, };
+
+/* todo: separate squaring table */
+/* todo: check unbalanced cutoffs */
+static const short fft_mullow_tab[] = {1115, 1115, 597, 569, 407, 321, 306, 279, 191,
+182, 166, 159, 152, 145, 139, 89, 85, 78, 75, 75, 69, 174, 174, 166, 159,
+152, 152, 152, 97, 101, 106, 111, 101, 101, 101, 139, 145, 145, 139, 145,
+145, 139, 145, 145, 145, 182, 182, 182, 182, 182, 182, 191, 200, 220, 210,
+200, 210, 210, 210, 210, 191, 182, 182, 174, };


Are these tabs architecture specific?

Yes, and all the other tuning parameters in this file too. Note that these particular tabs were around before; I just moved them to a new file.

Yes, I did notice that. Do we have a tuning program somewhere that we can use at a later point?

src/gr_poly/tune/cutoffs.c can generate this kind of table but it's not automatic.

vneiger · 2025-11-06T16:32:39Z

Nice!

BTW, a new (to FLINT) trick used in the unpacking code is the 32-bit precomped remainder algorithm by Lemire, Kaser & Kurz which could be useful elsewhere in the nmod modules. This is even faster than the code generated by GCC for remainder by a compile-time constant.

Is this different from using "Shoup precomputation" (as in #2061 ) but specialized to the 32-bit context?

fredrik-johansson · 2025-11-06T18:59:22Z

Nice!

BTW, a new (to FLINT) trick used in the unpacking code is the 32-bit precomped remainder algorithm by Lemire, Kaser & Kurz which could be useful elsewhere in the nmod modules. This is even faster than the code generated by GCC for remainder by a compile-time constant.

Is this different from using "Shoup precomputation" (as in #2061 ) but specialized to the 32-bit context?

Unless I missed something, the Shoup reduction still requires one conditional adjustment, but the Lemire et al. method doesn't: the high part of a product gives the exact remainder right away.

Co-authored-by: Albin Ahlbäck <albin.ahlback@gmail.com>

Repack to half-length fft_small convolution for tiny moduli in _nmod_…

2ae8838

…poly_mul and _nmod_poly_mullow

albinahlback reviewed Nov 6, 2025

View reviewed changes

src/nmod_poly/mullow_fft_small.c Outdated Show resolved Hide resolved

albinahlback reviewed Nov 6, 2025

View reviewed changes

Update src/nmod_poly/mullow_fft_small.c

b00e0a5

Co-authored-by: Albin Ahlbäck <albin.ahlback@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repack to half-length fft_small convolution for tiny moduli in _nmod_poly_mul and mullow #2478

Repack to half-length fft_small convolution for tiny moduli in _nmod_poly_mul and mullow #2478

fredrik-johansson commented Nov 6, 2025

Uh oh!

Uh oh!

albinahlback Nov 6, 2025

Uh oh!

fredrik-johansson Nov 6, 2025

Uh oh!

albinahlback Nov 6, 2025

Uh oh!

fredrik-johansson Nov 6, 2025

Uh oh!

vneiger commented Nov 6, 2025

Uh oh!

fredrik-johansson commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Repack to half-length fft_small convolution for tiny moduli in _nmod_poly_mul and mullow #2478

Are you sure you want to change the base?

Repack to half-length fft_small convolution for tiny moduli in _nmod_poly_mul and mullow #2478

Conversation

fredrik-johansson commented Nov 6, 2025

Uh oh!

Uh oh!

albinahlback Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

fredrik-johansson Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

albinahlback Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

fredrik-johansson Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

vneiger commented Nov 6, 2025

Uh oh!

fredrik-johansson commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants