Made has_equal_in a new format callable #2064

SadiinsoSnowfall · 2025-02-11T10:25:21Z

No description provided.

SadiinsoSnowfall · 2025-02-12T13:41:46Z

Waiting for the SVE2 CI

SadiinsoSnowfall · 2025-02-14T09:46:23Z

Closes #2065

SadiinsoSnowfall · 2025-02-17T19:07:00Z

Codegen for has_equal_in on SVE using g++ (codegen improved on clang too but the compilers fails to perform basic optimisations on this target)

; exactly 128 bits of data
test(eve::arm_sve256_v0::wide<unsigned char, eve::fixed<16l>>, eve::arm_sve256_v0::wide<unsigned char, eve::fixed<16l>>):
        ptrue   p0.b, vl32
        match   p0.b, p0/z, z0.b, z1.b
        ret
        
; less than 128 bits of data (32 bits)
test(eve::arm_sve256_v0::wide<unsigned char, eve::fixed<4l>>, eve::arm_sve256_v0::wide<unsigned char, eve::fixed<4l>>):
        mov     z1.s, s1
        ptrue   p0.b, vl4
        match   p0.b, p0/z, z0.b, z1.b
        ret
       
; more than 128 bits of data (256, in SVE2-256 mode)
test(eve::arm_sve256_v0::wide<unsigned char, eve::fixed<32l>>, eve::arm_sve256_v0::wide<unsigned char, eve::fixed<32l>>):
        ptrue   p0.b, vl32
        movprfx z26, z1
        ext     z26.b, z26.b, z1.b, #16
        match   p3.b, p0/z, z0.b, z26.b
        match   p2.b, p0/z, z0.b, z1.b
        movprfx z27, z0
        ext     z27.b, z27.b, z0.b, #16
        orr     p2.b, p0/z, p2.b, p3.b
        match   p1.b, p0/z, z27.b, z26.b
        match   p3.b, p0/z, z27.b, z1.b
        orr     p3.b, p0/z, p3.b, p1.b
        index   z29.b, #0, #1
        mov     z30.b, p3/z, #-1
        cmpls   p1.b, p0/z, z29.b, #15
        mov     z31.b, p2/z, #-1
        mov     z28.b, p1/z, #-1
        and     z28.b, z28.b, #0x10
        cmplo   p3.b, p0/z, z29.b, z28.b
        splice  z31.b, p3, z31.b, z30.b
        cmpne   p0.b, p0/z, z31.b, #0
        ret

The code for the first two cases is optimal (the special "broadcasting move" is used to round up the second operand to 128 bits while keeping the same values, and the mask used by the match instruction is used to disable the inactive lanes in the first operand.

The code for the third case is good enough, we have two pairs of movprfx+ext to split both operands into chunks of 128 bits, then two pairs of match+match+orr to perform the operation on all combinations of the 128-bits-wide blocks. Around half of the code is used to perform the final concatenation of the result back to 256 bits. This kind of operation isn't directly supported on SVE/2 and so results in a few instructions ending with a splice.

DenisYaroshevskiy · 2025-02-18T00:54:55Z

I don't like 256 codegen. I think it's a bit much. You can do 16bytes with 16 bytes. Then swap two sides and do that again.

include/eve/arch/arm/sve/sve_true.hpp

include/eve/module/core/regular/impl/simd/arm/sve/has_equal_in.hpp

SadiinsoSnowfall · 2025-02-18T15:50:36Z

Updated codegen with new algorithm :
SVE-512 - 256 bits (2 match lanes)

test(eve::arm_sve512_v0::wide<unsigned char, eve::fixed<32l> >, eve::arm_sve512_v0::wide<unsigned char, eve::fixed<32l> >):
       ptrue   p3.b, vl64
       movprfx z31, z1
       ext     z31.b, z31.b, z1.b, #32
       ext     z31.b, z31.b, z1.b, #32
       movprfx z30, z31
       ext     z30.b, z30.b, z31.b, #16
       match   p0.b, p3/z, z0.b, z31.b
       match   p2.b, p3/z, z0.b, z30.b
       orr     p0.b, p3/z, p0.b, p2.b
       ret

SVE-512 - 512 bits (4 match lanes)

test(eve::arm_sve512_v0::wide<unsigned char, eve::fixed<64l> >, eve::arm_sve512_v0::wide<unsigned char, eve::fixed<64l> >):
        ptrue   p3.b, vl64
        movprfx z31, z1
        ext     z31.b, z31.b, z1.b, #0
        ext     z31.b, z31.b, z1.b, #0
        movprfx z30, z31
        ext     z30.b, z30.b, z31.b, #16
        match   p1.b, p3/z, z0.b, z30.b
        movprfx z29, z30
        ext     z29.b, z29.b, z30.b, #16
        match   p2.b, p3/z, z0.b, z31.b
        movprfx z28, z29
        ext     z28.b, z28.b, z29.b, #16
        orr     p2.b, p3/z, p2.b, p1.b
        match   p0.b, p3/z, z0.b, z28.b
        match   p1.b, p3/z, z0.b, z29.b
        orr     p2.b, p3/z, p2.b, p1.b
        orr     p0.b, p3/z, p2.b, p0.b
        ret

SVE-256 256 bits (4 match lanes)

test(eve::arm_sve256_v0::wide<unsigned char, eve::fixed<32l> >, eve::arm_sve256_v0::wide<unsigned char, eve::fixed<32l> >):
        ptrue   p3.b, vl32
        movprfx z31, z1
        ext     z31.b, z31.b, z1.b, #0
        ext     z31.b, z31.b, z1.b, #0
        movprfx z30, z31
        ext     z30.b, z30.b, z31.b, #16
        match   p0.b, p3/z, z0.b, z31.b
        match   p2.b, p3/z, z0.b, z30.b
        orr     p0.b, p3/z, p0.b, p2.b
        ret

Tested using :

#include <eve/eve.hpp>

using namespace eve;

using T = wide<unsigned char, fixed<64>>; // change size & type here

auto test(T a, T b) {
    return has_equal_in(a, b);
}

include/eve/concept/invocable.hpp

DenisYaroshevskiy · 2025-02-21T15:59:53Z

include/eve/module/core/regular/has_equal_in.hpp

+  struct has_equal_in_t : callable<has_equal_in_t, Options>
+  {
+    template<simd_value T, simd_value U, simd_predicate<T, U> Op>
+    constexpr EVE_FORCEINLINE auto operator()(T x, U match_against, Op op) const noexcept -> decltype(op(x, match_against))


std::result_of_t

DenisYaroshevskiy · 2025-02-21T16:10:10Z

include/eve/module/core/regular/impl/simd/arm/sve/has_equal_in.hpp

+        // There is no need to broadcast the values inside the first operand because we will just adjust the mask to only
+        // consider the active lanes.
+        fw_t haystack{x};
+        fw_t needle = shuffle(fw_t{match_against}, eve::as_pattern([](auto i, auto) { return i % N::value; }));


shuffle_l<3>(fw_t{match_against}, [](auto i, auto) { return i % N::value; });

SadiinsoSnowfall force-pushed the callable/has_equal_in branch from acd53c7 to 7a34c68 Compare February 11, 2025 10:26

SadiinsoSnowfall marked this pull request as draft February 12, 2025 10:09

SadiinsoSnowfall force-pushed the callable/has_equal_in branch from 752ec1c to 08f7713 Compare February 12, 2025 13:41

SadiinsoSnowfall force-pushed the callable/has_equal_in branch 2 times, most recently from 4743b5d to ef14221 Compare February 14, 2025 09:08

SadiinsoSnowfall marked this pull request as ready for review February 14, 2025 09:22

SadiinsoSnowfall mentioned this pull request Feb 17, 2025

[BUG] try_each_group_position bad codegen #2069

Open

SadiinsoSnowfall marked this pull request as draft February 17, 2025 18:03

SadiinsoSnowfall marked this pull request as ready for review February 17, 2025 19:07

stash

c0a23b5

SadiinsoSnowfall force-pushed the callable/has_equal_in branch from 8bb3943 to bcb20a9 Compare February 17, 2025 19:58

SadiinsoSnowfall added 5 commits February 17, 2025 21:02

cleaned-up has_equal_in callable interface

ba20b35

added SVE2 impl & simd_predicate

b5887d8

adjusted constraints

b4a7b38

stash

435c2de

finished SVE2 impl

936db96

SadiinsoSnowfall force-pushed the callable/has_equal_in branch from bcb20a9 to 936db96 Compare February 17, 2025 20:03

DenisYaroshevskiy reviewed Feb 18, 2025

View reviewed changes

include/eve/arch/arm/sve/sve_true.hpp Outdated Show resolved Hide resolved

include/eve/module/core/regular/impl/simd/arm/sve/has_equal_in.hpp Outdated Show resolved Hide resolved

SadiinsoSnowfall added 3 commits February 18, 2025 08:49

[no ci] revert indentation

1905c55

stash

45f5b19

stash

6459502

SadiinsoSnowfall force-pushed the callable/has_equal_in branch from de44353 to 1d24753 Compare February 18, 2025 17:10

new SVE2 implementation for long vectors

ef7c329

SadiinsoSnowfall force-pushed the callable/has_equal_in branch from 1d24753 to ef7c329 Compare February 18, 2025 17:11

DenisYaroshevskiy reviewed Feb 19, 2025

View reviewed changes

include/eve/concept/invocable.hpp Show resolved Hide resolved

adjusted SVE relative conditional expression masking

107ae89

SadiinsoSnowfall force-pushed the callable/has_equal_in branch from 2e4b71f to 107ae89 Compare February 19, 2025 16:43

SadiinsoSnowfall added 2 commits February 20, 2025 10:31

special case SVE for to_logical_incomplete

3460fbd

fix

9f39050

DenisYaroshevskiy reviewed Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made has_equal_in a new format callable #2064

Made has_equal_in a new format callable #2064

SadiinsoSnowfall commented Feb 11, 2025

SadiinsoSnowfall commented Feb 12, 2025

SadiinsoSnowfall commented Feb 14, 2025 •

edited

Loading

SadiinsoSnowfall commented Feb 17, 2025 •

edited

Loading

DenisYaroshevskiy commented Feb 18, 2025

SadiinsoSnowfall commented Feb 18, 2025 •

edited

Loading

DenisYaroshevskiy Feb 21, 2025

DenisYaroshevskiy Feb 21, 2025

Made has_equal_in a new format callable #2064

Are you sure you want to change the base?

Made has_equal_in a new format callable #2064

Conversation

SadiinsoSnowfall commented Feb 11, 2025

SadiinsoSnowfall commented Feb 12, 2025

SadiinsoSnowfall commented Feb 14, 2025 • edited Loading

SadiinsoSnowfall commented Feb 17, 2025 • edited Loading

DenisYaroshevskiy commented Feb 18, 2025

SadiinsoSnowfall commented Feb 18, 2025 • edited Loading

DenisYaroshevskiy Feb 21, 2025

Choose a reason for hiding this comment

DenisYaroshevskiy Feb 21, 2025

Choose a reason for hiding this comment

SadiinsoSnowfall commented Feb 14, 2025 •

edited

Loading

SadiinsoSnowfall commented Feb 17, 2025 •

edited

Loading

SadiinsoSnowfall commented Feb 18, 2025 •

edited

Loading