Fix: Types in sparse dot-products generic macro #253

real-eren · 2025-02-20T03:21:31Z

Re: types
The dot-product intrinsics _mm256_dpbf16_ps, _mm256_dpwssd_epi32, and svbfdot_f32 each widen the values before multiplying them, so the scalar algorithm should also do that. Also, the macro currently uses the counter_type, which is an unsigned int.

Re: saturating addition
Same links, none of them perform saturating addition other than the currently used _mm256_dpwssds_epi32, so the current spdot_counts_u16_turin kernel is the odd one out.

The `SIMSIMD_MAKE_INTERSECT_WEIGHTED` macro previously used `counter_type` when loading weights. Beyond just using the `weight_type`, it should use the `accumulator_type` so as to match the behavior of SIMD intrinsics such as `_mm256_dbpf16_ps` which widen the values before doing the dot-product.

ashvardanian · 2025-02-20T03:51:02Z

Would it be hard to use saturating addition everywhere? I can imagine receiving really large inputs in those functions.

PS: Thanks for following the commit naming convention!

real-eren · 2025-02-20T05:25:30Z

Would it be hard to use saturating addition everywhere? I can imagine receiving really large inputs in those functions.

Good point.

bf16

~~Would overflowing into infinity be the desired behavior?~~ Edit: should be signaling NaN. I think you could check for infinity at the end and replace it with NaN.

i16

Scalar

add_sat (Edit: C++26, not usable here)

AVX512

_mm256_dpwssds_epi32 for inside the loop

No intrinsic for saturating horizontal add, but can easily adapt the reduce_add_epi32 routine from the earlier PR to use saturating adds. Edit: Would need to emulate for adds_epi32.

SVE2

I don't see a dot product intrinsic with saturation built-in, but here's an idea for inside the loop:

auto tmp = svdot_s64(zero_vec, a_weights_vec, b_equal_weights_vec);
product_vec = svaddq_s64_z(svptrue_b64(), product_vec, tmp);

I'm unfamiliar with SVE, so take the above with a grain of salt.

real-eren · 2025-02-21T21:27:53Z

I realize now that there are some subtleties to handling overflows in each backend, both for the implementation and the API design, so I will open a separate issue for that.

real-eren changed the title ~~Fix: Sparse dotproducts generic casting, Avoid saturating addition~~ Fix: Types in sparse dot-products generic macro, Avoid saturating addition in spdot_counts_u16_turin Feb 20, 2025

real-eren force-pushed the fix-sparse-dotproducts-generic-casting branch from 268d945 to 354a6b8 Compare February 21, 2025 21:24

real-eren changed the title ~~Fix: Types in sparse dot-products generic macro, Avoid saturating addition in spdot_counts_u16_turin~~ Fix: Types in sparse dot-products generic macro Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Types in sparse dot-products generic macro #253

Fix: Types in sparse dot-products generic macro #253

real-eren commented Feb 20, 2025

ashvardanian commented Feb 20, 2025

real-eren commented Feb 20, 2025 •

edited

Loading

real-eren commented Feb 21, 2025

Fix: Types in sparse dot-products generic macro #253

Are you sure you want to change the base?

Fix: Types in sparse dot-products generic macro #253

Conversation

real-eren commented Feb 20, 2025

ashvardanian commented Feb 20, 2025

real-eren commented Feb 20, 2025 • edited Loading

bf16

i16

Scalar

AVX512

SVE2

real-eren commented Feb 21, 2025

real-eren commented Feb 20, 2025 •

edited

Loading