Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Types in sparse dot-products generic macro #253

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

real-eren
Copy link
Contributor

Re: types
The dot-product intrinsics _mm256_dpbf16_ps, _mm256_dpwssd_epi32, and svbfdot_f32 each widen the values before multiplying them, so the scalar algorithm should also do that. Also, the macro currently uses the counter_type, which is an unsigned int.


Re: saturating addition
Same links, none of them perform saturating addition other than the currently used _mm256_dpwssds_epi32, so the current spdot_counts_u16_turin kernel is the odd one out.

The `SIMSIMD_MAKE_INTERSECT_WEIGHTED` macro previously used `counter_type` when loading weights.
Beyond just using the `weight_type`, it should use the `accumulator_type` so as to match
the behavior of SIMD intrinsics such as `_mm256_dbpf16_ps` which widen the values before doing the dot-product.
@real-eren real-eren changed the title Fix: Sparse dotproducts generic casting, Avoid saturating addition Fix: Types in sparse dot-products generic macro, Avoid saturating addition in spdot_counts_u16_turin Feb 20, 2025
@ashvardanian
Copy link
Owner

Would it be hard to use saturating addition everywhere? I can imagine receiving really large inputs in those functions.

PS: Thanks for following the commit naming convention!

@real-eren
Copy link
Contributor Author

real-eren commented Feb 20, 2025

Would it be hard to use saturating addition everywhere? I can imagine receiving really large inputs in those functions.

Good point.

bf16

Would overflowing into infinity be the desired behavior? Edit: should be signaling NaN. I think you could check for infinity at the end and replace it with NaN.

i16

Scalar

add_sat (Edit: C++26, not usable here)

AVX512

_mm256_dpwssds_epi32 for inside the loop

No intrinsic for saturating horizontal add, but can easily adapt the reduce_add_epi32 routine from the earlier PR to use saturating adds. Edit: Would need to emulate for adds_epi32.

SVE2

I don't see a dot product intrinsic with saturation built-in, but here's an idea for inside the loop:

auto tmp = svdot_s64(zero_vec, a_weights_vec, b_equal_weights_vec);
product_vec = svaddq_s64_z(svptrue_b64(), product_vec, tmp);

I'm unfamiliar with SVE, so take the above with a grain of salt.

@real-eren real-eren force-pushed the fix-sparse-dotproducts-generic-casting branch from 268d945 to 354a6b8 Compare February 21, 2025 21:24
@real-eren real-eren changed the title Fix: Types in sparse dot-products generic macro, Avoid saturating addition in spdot_counts_u16_turin Fix: Types in sparse dot-products generic macro Feb 21, 2025
@real-eren
Copy link
Contributor Author

I realize now that there are some subtleties to handling overflows in each backend, both for the implementation and the API design, so I will open a separate issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants