Fix family of `_mmX_alignr_epiX` functions #1678

marxin · 2024-11-18T18:18:07Z

The function is supposed to return the first argument for IMM8 == 8.

Can be verified by both Clang and GCC for the following C++ code snippet:

#include <immintrin.h>
#include <stdint.h>

#include <iostream>

template<class T> inline void log(const __m256i& value)
{
    const size_t n = sizeof(__m256i) / sizeof(T);
    T buffer[n];
    _mm256_storeu_si256((__m256i*)buffer, value);
    for (int i = 0; i < n; i++)
        std::cout << std::hex << buffer[i] << " ";
    std::cout << std::endl;
}

int main()
{
    const auto a = _mm256_setr_epi8(
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
        25, 26, 27, 28, 29, 30, 31, 32
        );
    const auto b = _mm256_setr_epi8(
        -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19,
        -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -30, -31, -32
        );
    std::cout << "a: ";
    log<uint64_t>(a); 
    std::cout << "b: ";
    log<uint64_t>(b);
    const auto r = _mm256_alignr_epi8(a, b, 16);
    std::cout << "_mm256_alignr_epi8(a, b, 16):\n   ";
    log<uint64_t>(r);

    return 0;
}

✦ ❯ g++ stdarch.cc -march=native && ./a.out 
a: 807060504030201 100f0e0d0c0b0a09 1817161514131211 201f1e1d1c1b1a19 
b: f8f9fafbfcfdfeff f0f1f2f3f4f5f6f7 e8e9eaebecedeeef e0e1e2e3e4e5e6e7 
_mm256_alignr_epi8(a, b, 16):
   807060504030201 100f0e0d0c0b0a09 1817161514131211 201f1e1d1c1b1a19

✦ ❯ clang++ stdarch.cc -march=native && ./a.out 
a: 807060504030201 100f0e0d0c0b0a09 1817161514131211 201f1e1d1c1b1a19 
b: f8f9fafbfcfdfeff f0f1f2f3f4f5f6f7 e8e9eaebecedeeef e0e1e2e3e4e5e6e7 
_mm256_alignr_epi8(a, b, 16):
   807060504030201 100f0e0d0c0b0a09 1817161514131211 201f1e1d1c1b1a19

rustbot · 2024-11-18T18:18:13Z

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

marxin · 2024-11-18T18:19:52Z

CC: @alexcrichton

Amanieu · 2024-11-26T22:56:21Z

Could you check all the other alignr intrinsics to see if they have the same issue?

marxin · 2024-11-28T20:56:49Z

Oh, a good point. Yes, there is at least one more function that behaves also wrongly. I will get to it over the weekend.

crates/core_arch/src/x86/avx2.rs

marxin · 2024-11-30T14:03:59Z

All right, so the situation is more complicated, and for the purpose of comparison (in between GCC/Clang C++ and Rust stdarch), I wrote the following script that utilizes all the non-mask functions from the alignr family with all intermediate masks up to 256.

gen.py.txt

I noticed the following issues:

_mm256_alignr_epi8, _mm512_alignr_epi8 - bad handling of IMM8 == 16 and 32 (that was my original motivation for this PR)
_mm_alignr_epi32, _mm256_alignr_epi32, _mm_alignr_epi64, _mm256_alignr_epi64 - these intrinsic have a limited number of bits that are used from the mask (please see the Intel Intrinsics Guide).
mm512_alignr_epi32, mm512_alignr_epi64 - also have a limit on the number of bits, but it's enough (no code change needed) + docs update mentions the maximum shift
_mm512_alignr_epi8 is a bit weird (as opposed to _mm512_alignr_epi32, _mm512_alignr_epi64): it does not concatenate the 2 512-bit vectors first, so I updated the docs

The function is supposed to return first argument for IMM8 == 8.

rustbot assigned Amanieu Nov 18, 2024

marxin mentioned this pull request Nov 18, 2024

_mm256_alignr_epi8 is incorrect #328

Closed

alexcrichton reviewed Nov 29, 2024

View reviewed changes

crates/core_arch/src/x86/avx2.rs Show resolved Hide resolved

marxin changed the title ~~Fix implementation of _mm256_alignr_epi8<16>~~ Fix family of _mmX_alignr_epiX function Nov 30, 2024

marxin changed the title ~~Fix family of _mmX_alignr_epiX function~~ Fix family of _mmX_alignr_epiX functions Nov 30, 2024

marxin added 2 commits November 30, 2024 15:05

Fix implementation of _mm256_alignr_epi8<16>

b6e03a7

The function is supposed to return first argument for IMM8 == 8.

Fix and document _mmX_alignr_epiX family of intrinsics

9304fe7

marxin force-pushed the fix-wrong-code-for-_mm256_alignr_epi8-16 branch from 01d8791 to 9304fe7 Compare November 30, 2024 14:06

Fix test-case

83123c1

Amanieu added this pull request to the merge queue Nov 30, 2024

Merged via the queue into rust-lang:master with commit 945fecc Nov 30, 2024
54 checks passed

marxin deleted the fix-wrong-code-for-_mm256_alignr_epi8-16 branch December 1, 2024 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix family of `_mmX_alignr_epiX` functions #1678

Fix family of `_mmX_alignr_epiX` functions #1678

marxin commented Nov 18, 2024

rustbot commented Nov 18, 2024

marxin commented Nov 18, 2024

Amanieu commented Nov 26, 2024

marxin commented Nov 28, 2024

marxin commented Nov 30, 2024

Fix family of _mmX_alignr_epiX functions #1678

Fix family of _mmX_alignr_epiX functions #1678

Conversation

marxin commented Nov 18, 2024

rustbot commented Nov 18, 2024

marxin commented Nov 18, 2024

Amanieu commented Nov 26, 2024

marxin commented Nov 28, 2024

marxin commented Nov 30, 2024

Fix family of `_mmX_alignr_epiX` functions #1678

Fix family of `_mmX_alignr_epiX` functions #1678