Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix family of _mmX_alignr_epiX functions #1678

Merged

Conversation

marxin
Copy link
Contributor

@marxin marxin commented Nov 18, 2024

The function is supposed to return the first argument for IMM8 == 8.

Can be verified by both Clang and GCC for the following C++ code snippet:

#include <immintrin.h>
#include <stdint.h>

#include <iostream>

template<class T> inline void log(const __m256i& value)
{
    const size_t n = sizeof(__m256i) / sizeof(T);
    T buffer[n];
    _mm256_storeu_si256((__m256i*)buffer, value);
    for (int i = 0; i < n; i++)
        std::cout << std::hex << buffer[i] << " ";
    std::cout << std::endl;
}

int main()
{
    const auto a = _mm256_setr_epi8(
        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
        25, 26, 27, 28, 29, 30, 31, 32
        );
    const auto b = _mm256_setr_epi8(
        -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19,
        -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -30, -31, -32
        );
    std::cout << "a: ";
    log<uint64_t>(a); 
    std::cout << "b: ";
    log<uint64_t>(b);
    const auto r = _mm256_alignr_epi8(a, b, 16);
    std::cout << "_mm256_alignr_epi8(a, b, 16):\n   ";
    log<uint64_t>(r);

    return 0;
}
✦ ❯ g++ stdarch.cc -march=native && ./a.out 
a: 807060504030201 100f0e0d0c0b0a09 1817161514131211 201f1e1d1c1b1a19 
b: f8f9fafbfcfdfeff f0f1f2f3f4f5f6f7 e8e9eaebecedeeef e0e1e2e3e4e5e6e7 
_mm256_alignr_epi8(a, b, 16):
   807060504030201 100f0e0d0c0b0a09 1817161514131211 201f1e1d1c1b1a19 
✦ ❯ clang++ stdarch.cc -march=native && ./a.out 
a: 807060504030201 100f0e0d0c0b0a09 1817161514131211 201f1e1d1c1b1a19 
b: f8f9fafbfcfdfeff f0f1f2f3f4f5f6f7 e8e9eaebecedeeef e0e1e2e3e4e5e6e7 
_mm256_alignr_epi8(a, b, 16):
   807060504030201 100f0e0d0c0b0a09 1817161514131211 201f1e1d1c1b1a19 

@rustbot
Copy link
Collaborator

rustbot commented Nov 18, 2024

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@marxin
Copy link
Contributor Author

marxin commented Nov 18, 2024

CC: @alexcrichton

@Amanieu
Copy link
Member

Amanieu commented Nov 26, 2024

Could you check all the other alignr intrinsics to see if they have the same issue?

@marxin
Copy link
Contributor Author

marxin commented Nov 28, 2024

Oh, a good point. Yes, there is at least one more function that behaves also wrongly. I will get to it over the weekend.

@marxin marxin changed the title Fix implementation of _mm256_alignr_epi8<16> Fix family of _mmX_alignr_epiX function Nov 30, 2024
@marxin
Copy link
Contributor Author

marxin commented Nov 30, 2024

All right, so the situation is more complicated, and for the purpose of comparison (in between GCC/Clang C++ and Rust stdarch), I wrote the following script that utilizes all the non-mask functions from the alignr family with all intermediate masks up to 256.

gen.py.txt

I noticed the following issues:

  • _mm256_alignr_epi8, _mm512_alignr_epi8 - bad handling of IMM8 == 16 and 32 (that was my original motivation for this PR)
  • _mm_alignr_epi32, _mm256_alignr_epi32, _mm_alignr_epi64, _mm256_alignr_epi64 - these intrinsic have a limited number of bits that are used from the mask (please see the Intel Intrinsics Guide).
  • mm512_alignr_epi32, mm512_alignr_epi64 - also have a limit on the number of bits, but it's enough (no code change needed) + docs update mentions the maximum shift
  • _mm512_alignr_epi8 is a bit weird (as opposed to _mm512_alignr_epi32, _mm512_alignr_epi64): it does not concatenate the 2 512-bit vectors first, so I updated the docs

@marxin marxin changed the title Fix family of _mmX_alignr_epiX function Fix family of _mmX_alignr_epiX functions Nov 30, 2024
@marxin marxin force-pushed the fix-wrong-code-for-_mm256_alignr_epi8-16 branch from 01d8791 to 9304fe7 Compare November 30, 2024 14:06
@Amanieu Amanieu added this pull request to the merge queue Nov 30, 2024
Merged via the queue into rust-lang:master with commit 945fecc Nov 30, 2024
54 checks passed
@marxin marxin deleted the fix-wrong-code-for-_mm256_alignr_epi8-16 branch December 1, 2024 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants