Description
I tried this code:
#![feature(stdarch_x86_avx512)]
use std::arch::x86_64::*;
fn main() {
unsafe {
let a = _mm512_set1_epi32(0xffff);
let b = _mm512_setzero_epi32();
let c = _mm512_set1_epi32(1);
let dst = _mm512_shrdv_epi32(a, b, c);
println!("{}", _mm512_cvtsi512_si32(dst));
}
}
I expected to see this happen:
The code produces the same output as the equivalent C program:
#include <immintrin.h>
#include <stdio.h>
int main() {
__m512i a = _mm512_set1_epi32(0xffff);
__m512i b = _mm512_setzero_epi32();
__m512i c = _mm512_set1_epi32(1);
__m512i dst = _mm512_shrdv_epi32(a, b, c);
printf("%u\n", _mm512_cvtsi512_si32(dst));
}
The program outputs 32767.
Instead, the Rust program outputs -2147483648.
Intel's documentation (as linked in the rustdoc for the function) for _mm512_shrdv_epi32
states:
Concatenate packed 32-bit integers in b and a producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of c, and store the lower 32-bits in dst.
FOR j := 0 to 15 i := j*32 dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ENDFOR dst[MAX:512] := 0
meaning argument b
is the upper bits, and a
is the lower bits. However, llvm.fshr.*
uses the opposite order. It appears Rust is passing arguments a, b, and c in that order to llvm.fshr
:
This likely also applies to all similar intrinsics that call llvm.fshr
.
Meta
rustc --version --verbose
:
rustc 1.83.0-nightly (0609062a9 2024-09-13)
binary: rustc
commit-hash: 0609062a91c8f445c3e9a0de57e402f9b1b8b0a7
commit-date: 2024-09-13
host: x86_64-unknown-linux-gnu
release: 1.83.0-nightly
LLVM version: 19.1.0
Activity
workingjubilee commentedon Sep 14, 2024
Ah, thank you for reporting this!
stabilizing #111137 should be blocked on this (on top of other reasons it cannot yet be stabilized).
workingjubilee commentedon Sep 14, 2024
cc @minybot on the off chance you would like to do the follow up (but I see no evidence they are interacting with GitHub lately so it is very likely that is a dead letter),
so: also ccing @sayantn
bjorn3 commentedon Sep 15, 2024
Do GCC's
__builtin_ia32_vpshrdv_v*di
intrinsics have the same reverse order as LLVM's intrinsic? If not the GCC and LLVM backends don't agree on how to compile_mm512_shrdv_*
.sayantn commentedon Sep 22, 2024
This bug was probably due to an inconsistency on Intel's part. The
shld
intrinsics pack likea || b
, but theshrd
intrinsics pack likeb || a
🤦🏽. I don't think this will require any change on the compiler side, as the compiler intrinsics are (?) consistent about the packing order (i.e, they always pack<first> || <second>
)sayantn commentedon Sep 22, 2024
The fix has been merged in stdarch (rust-lang/stdarch#1644)
workingjubilee commentedon Sep 25, 2024
Thank you for taking care of that!