Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_mm512_shrdv_* intrinsics have incorrect argument order #130365

Closed
as-com opened this issue Sep 14, 2024 · 6 comments
Closed

_mm512_shrdv_* intrinsics have incorrect argument order #130365

as-com opened this issue Sep 14, 2024 · 6 comments
Labels
A-intrinsics Area: Intrinsics A-SIMD Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. O-x86_64 Target: x86-64 processors (like x86_64-*)

Comments

@as-com
Copy link
Contributor

as-com commented Sep 14, 2024

I tried this code:

#![feature(stdarch_x86_avx512)]

use std::arch::x86_64::*;

fn main() {
    unsafe {
        let a = _mm512_set1_epi32(0xffff);
        let b = _mm512_setzero_epi32();
        let c = _mm512_set1_epi32(1);
    
        let dst = _mm512_shrdv_epi32(a, b, c);
        println!("{}", _mm512_cvtsi512_si32(dst));    
    }
}

I expected to see this happen:

The code produces the same output as the equivalent C program:

#include <immintrin.h>
#include <stdio.h>

int main() {
    __m512i a = _mm512_set1_epi32(0xffff);
    __m512i b = _mm512_setzero_epi32();
    __m512i c = _mm512_set1_epi32(1);

    __m512i dst = _mm512_shrdv_epi32(a, b, c);
    printf("%u\n", _mm512_cvtsi512_si32(dst));
}

The program outputs 32767.

Instead, the Rust program outputs -2147483648.


Intel's documentation (as linked in the rustdoc for the function) for _mm512_shrdv_epi32 states:

Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.

FOR j := 0 to 15
	i := j*32
	dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31)
ENDFOR
dst[MAX:512] := 0

meaning argument b is the upper bits, and a is the lower bits. However, llvm.fshr.* uses the opposite order. It appears Rust is passing arguments a, b, and c in that order to llvm.fshr:

https://github.com/rust-lang/stdarch/blob/b1edbf90955cb9b057a323f761e2c19edb591e6f/crates/core_arch/src/x86/avx512vbmi2.rs#L997-L999

This likely also applies to all similar intrinsics that call llvm.fshr.

Meta

rustc --version --verbose:

rustc 1.83.0-nightly (0609062a9 2024-09-13)
binary: rustc
commit-hash: 0609062a91c8f445c3e9a0de57e402f9b1b8b0a7
commit-date: 2024-09-13
host: x86_64-unknown-linux-gnu
release: 1.83.0-nightly
LLVM version: 19.1.0
@as-com as-com added the C-bug Category: This is a bug. label Sep 14, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 14, 2024
@workingjubilee
Copy link
Member

Ah, thank you for reporting this!

stabilizing #111137 should be blocked on this (on top of other reasons it cannot yet be stabilized).

@workingjubilee
Copy link
Member

cc @minybot on the off chance you would like to do the follow up (but I see no evidence they are interacting with GitHub lately so it is very likely that is a dead letter),

so: also ccing @sayantn

@workingjubilee workingjubilee added O-x86_64 Target: x86-64 processors (like x86_64-*) A-SIMD Area: SIMD (Single Instruction Multiple Data) A-intrinsics Area: Intrinsics labels Sep 14, 2024
@bjorn3
Copy link
Member

bjorn3 commented Sep 15, 2024

Do GCC's __builtin_ia32_vpshrdv_v*di intrinsics have the same reverse order as LLVM's intrinsic? If not the GCC and LLVM backends don't agree on how to compile _mm512_shrdv_*.

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 18, 2024
@sayantn
Copy link
Contributor

sayantn commented Sep 22, 2024

This bug was probably due to an inconsistency on Intel's part. The shld intrinsics pack like a || b, but the shrd intrinsics pack like b || a 🤦🏽. I don't think this will require any change on the compiler side, as the compiler intrinsics are (?) consistent about the packing order (i.e, they always pack <first> || <second>)

@sayantn
Copy link
Contributor

sayantn commented Sep 22, 2024

The fix has been merged in stdarch (rust-lang/stdarch#1644)

@workingjubilee
Copy link
Member

Thank you for taking care of that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-intrinsics Area: Intrinsics A-SIMD Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. O-x86_64 Target: x86-64 processors (like x86_64-*)
Projects
None yet
Development

No branches or pull requests

6 participants