Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for AVX512 intrinsics #111137

Open
2 tasks
Amanieu opened this issue May 3, 2023 · 28 comments
Open
2 tasks

Tracking Issue for AVX512 intrinsics #111137

Amanieu opened this issue May 3, 2023 · 28 comments
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC O-x86_32 Target: x86 processors, 32 bit (like i686-*) O-x86_64 Target: x86-64 processors (like x86_64-*) T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Amanieu
Copy link
Member

Amanieu commented May 3, 2023

Feature gate: #![feature(stdarch_x86_avx512)]

This is a tracking issue for the AVX-512 (and related extensions) intrinsics in core::arch.

Public API

This feature covers all of the intrinsics from the following features:

  • avx512bf16
  • avx512bitalg
  • avx512bw
  • avx512cd
  • avx512f
  • avx512ifma
  • avx512vbmi
  • avx512vbmi2
  • avx512vnni
  • avx512vpopcntdq
  • gfni
  • vaes
  • vpclmulqdq

VEX variants

  • avxifma
  • avxneconvert
  • avxvnni
  • avxvnniint16
  • avxvnniint8

Implementation History

Steps

  • Final comment period (FCP)1
  • Stabilization PR

Unresolved Questions and Other Concerns

Footnotes

  1. https://std-dev-guide.rust-lang.org/feature-lifecycle/stabilization.html

@Amanieu Amanieu added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels May 3, 2023
@Jules-Bertholet
Copy link
Contributor

@rustbot label O-x86

@rustbot rustbot added the O-x86 label May 4, 2023
@IceTDrinker
Copy link

Hello, what are the guidelines to potentially contribute intrinsics?

Cheers

@Noratrieb Noratrieb added O-x86_64 Target: x86-64 processors (like x86_64-*) O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023
@Amanieu
Copy link
Member Author

Amanieu commented Nov 3, 2023

Currently the main blocker for stabilizing AVX-512 intrinsics is that we are still missing some. See these files for the list of missing intrinsics:

There may also be missing intrinsics for some of the other AVX512 subsets, this should be double-checked.

@tgross35
Copy link
Contributor

It seems like most of the intrinsics that are not yet implemented are labeled not in LLVM. Is stabilization blocked on those, or just the ones labeled need i1?

@Amanieu
Copy link
Member Author

Amanieu commented Nov 18, 2023

The documents were made quite a few years ago, and should be checked against the equivalent intrinsics in the latest version of Clang.

Regarding the "not in llvm", we can skip these since they are supported by neither Clang nor GCC. It seems these are only supported by icc for Xeon Phi targets.

@tarcieri
Copy link
Contributor

tarcieri commented Feb 9, 2024

Not sure if this is a good place to ask, but I'm curious if there are any blockers for stabilizing avx512_target_feature, or it just needs a stabilization PR.

I previously asked here without a reply: #44839 (comment)

@Amanieu
Copy link
Member Author

Amanieu commented Feb 11, 2024

Not sure if this is a good place to ask, but I'm curious if there are any blockers for stabilizing avx512_target_feature, or it just needs a stabilization PR.

Yes, this is the right place to ask: essentially this is blocked on the AVX512 baseline intrinsics still being incomplete, see my comment above.

@IceTDrinker
Copy link

what is considered baseline ?

I see that e.g. _mm512_cvtt_roundpd_epi64 from AVX512DQ is not available today and I don't see an axv512dq.md file in the core arch dir

@Amanieu
Copy link
Member Author

Amanieu commented Feb 12, 2024

I would consider F + VL/DQ/BW as the baseline for initial stabilization of AVX512 intrinsics. The MD files may be somewhat out of date and need someone to double-check against the full list of intrinsics.

@RalfJung
Copy link
Member

We should resolve rust-lang/stdarch#1533 before stabilizing these intrinsics.

@nikic
Copy link
Contributor

nikic commented Feb 17, 2024

We also need to consider how this interacts with AVX10 now. In #121088 I made all the +avx512 target features imply +evex512 to restore the status quo, but this means that there is currently no way to support AVX10.N/256. We'll presumably want to figure out some way to support that before avx512 support is stabilized. Possibly by explicitly adding +evex512 to all avx512 intrinsics that use 512-vectors (and having the same requirement for user code).

@AlexanderSchuetz97
Copy link

A dumb question, since this appears to be blocked on some cpu instructions not having a corresponding wrapper function due to downstream compilers not supporting them yet, why not stabilize it peacemeal? The instructions that are already implemented (provided that they do work as advertised) would already help me out a lot. I dont really see the need why all avx512 instruction wrappers need to be stabilized at the same time.

@tgross35
Copy link
Contributor

tgross35 commented May 3, 2024

Here is a more updated list of what is missing in stdarch:

# in llvm-project
llvm_512f=$(rg '(?s:static __inline.*?(?P<fn_name>[a-z0-9_]+?)\s*\(|#define (?P<def_name>[a-z0-9_]+)\()' --only-matching --multiline  --no-filename -r '$fn_name$def_name' --color=auto clang/lib/Headers/avx512fintrin.h clang/lib/Headers/avx512vlintrin.h | sort)
llvm_512bw=$(rg '(?s:static __inline.*?(?P<fn_name>[a-z0-9_]+?)\s*\(|#define (?P<def_name>[a-z0-9_]+)\()' --only-matching --multiline  --no-filename -r '$fn_name$def_name' --color=auto clang/lib/Headers/avx512bwintrin.h | sort)

# in stdarch
stdarch_512f=$(rg 'pub unsafe fn (\w+)' --only-matching -r '$1' --color=auto crates/core_arch/src/x86/avx512f.rs | sort)
stdarch_512bw=$(rg 'pub unsafe fn (\w+)' --only-matching -r '$1' --color=auto crates/core_arch/src/x86/avx512bw.rs | sort)

# Find everything only in llvm but not rust
missing_f=$(echo "$llvm_512f$stdarch_512f" | sort | uniq --unique)
missing_bw=$(echo "$llvm_512bw$stdarch_512bw" | sort | uniq --unique)

# print things that aren't mentioned at all in stdarch
echo "$missing_f" | xargs -IINAME sh -c 'if ! rg INAME > /dev/null ; then echo INAME; fi'
echo "$missing_bw" | xargs -IINAME sh -c 'if ! rg INAME > /dev/null ; then echo INAME; fi'

The results are:

Missing avx512f intrinsics
_cvtmask16_u32
_cvtu32_mask16
_kandn_mask16
_knot_mask16
_kor_mask16
_kortest_mask16_u8
_kortestc_mask16_u8
_kortestz_mask16_u8
_kshiftli_mask16
_kshiftri_mask16
_kxnor_mask16
_kxor_mask16
_load_mask16
_mm256_and_epi32
_mm256_and_epi64
_mm256_andnot_epi32
_mm256_andnot_epi64
_mm256_cvtepu32_ps
_mm256_i32scatter_epi32
_mm256_i32scatter_pd
_mm256_i32scatter_ps
_mm256_i64scatter_epi32
_mm256_i64scatter_epi64
_mm256_i64scatter_pd
_mm256_i64scatter_ps
_mm256_mask_cvtepu32_ps
_mm256_mask_cvtps_pd
_mm256_mask_cvtps_ph
_mm256_mask_i32scatter_epi32
_mm256_mask_i32scatter_epi64
_mm256_mask_i32scatter_pd
_mm256_mask_i32scatter_ps
_mm256_mask_i64scatter_epi32
_mm256_mask_i64scatter_epi64
_mm256_mask_i64scatter_pd
_mm256_mask_i64scatter_ps
_mm256_maskz_cvtepu32_ps
_mm256_maskz_cvtps_pd
_mm256_maskz_cvtps_ph
_mm256_mmask_i32gather_epi32
_mm256_mmask_i32gather_epi64
_mm256_mmask_i32gather_pd
_mm256_mmask_i32gather_ps
_mm256_mmask_i64gather_epi32
_mm256_mmask_i64gather_epi64
_mm256_mmask_i64gather_pd
_mm256_mmask_i64gather_ps
_mm256_rsqrt14_pd
_mm256_rsqrt14_ps
_mm512_ceil_pd
_mm512_ceil_ps
_mm512_cvtph_ps
_mm512_cvtps_ph
_mm512_cvtsd_f64
_mm512_cvtss_f32
_mm512_floor_pd
_mm512_floor_ps
_mm512_i32logather_epi64
_mm512_i32logather_pd
_mm512_i32loscatter_epi64
_mm512_i32loscatter_pd
_mm512_kortestz
_mm512_mask_ceil_pd
_mm512_mask_ceil_ps
_mm512_mask_cvtps_ph
_mm512_mask_floor_pd
_mm512_mask_floor_ps
_mm512_mask_i32logather_epi64
_mm512_mask_i32logather_pd
_mm512_mask_i32loscatter_epi64
_mm512_mask_i32loscatter_pd
_mm512_mask_permutevar_epi32
_mm512_mask_sqrt_ps
_mm512_maskz_cvtps_ph
_mm512_maskz_sqrt_ps
_mm512_max_pd
_mm512_max_ps
_mm512_min_pd
_mm512_min_ps
_mm512_permutevar_epi32
_mm512_rcp14_pd
_mm512_rcp14_ps
_mm512_rsqrt14_pd
_mm512_rsqrt14_ps
_mm512_set_epi16
_mm512_set_epi8
_mm512_setzero
_mm512_setzero_epi32
_mm512_setzero_pd
_mm512_setzero_si512
_mm512_sqrt_pd
_mm512_sqrt_ps
_mm512_stream_load_si512
_mm_abs_epi64
_mm_and_epi32
_mm_and_epi64
_mm_andnot_epi32
_mm_andnot_epi64
_mm_cvt_roundi64_sd
_mm_cvt_roundi64_ss
_mm_cvt_roundsd_i64
_mm_cvt_roundsd_si64
_mm_cvt_roundsd_u64
_mm_cvt_roundsi64_sd
_mm_cvt_roundsi64_ss
_mm_cvt_roundss_i64
_mm_cvt_roundss_si64
_mm_cvt_roundss_u64
_mm_cvt_roundu64_sd
_mm_cvt_roundu64_ss
_mm_cvtepu32_ps
_mm_cvti32_sd
_mm_cvti32_ss
_mm_cvtsd_i32
_mm_cvtsd_u64
_mm_cvtss_i32
_mm_cvtss_u64
_mm_cvtt_roundsd_i64
_mm_cvtt_roundsd_si64
_mm_cvtt_roundsd_u64
_mm_cvtt_roundss_i64
_mm_cvtt_roundss_si64
_mm_cvtt_roundss_u64
_mm_cvttsd_i64
_mm_cvttsd_u64
_mm_cvttss_i64
_mm_cvttss_u64
_mm_cvtu64_sd
_mm_cvtu64_ss
_mm_i32scatter_epi32
_mm_i32scatter_epi64
_mm_i32scatter_pd
_mm_i32scatter_ps
_mm_i64scatter_epi32
_mm_i64scatter_epi64
_mm_i64scatter_pd
_mm_i64scatter_ps
_mm_mask_abs_epi64
_mm_mask_cvtepu32_ps
_mm_mask_cvtps_pd
_mm_mask_cvtps_ph
_mm_mask_i32scatter_epi32
_mm_mask_i32scatter_epi64
_mm_mask_i32scatter_pd
_mm_mask_i32scatter_ps
_mm_mask_i64scatter_epi32
_mm_mask_i64scatter_epi64
_mm_mask_i64scatter_pd
_mm_mask_i64scatter_ps
_mm_mask_load_sd
_mm_mask_load_ss
_mm_mask_min_epi64
_mm_mask_store_sd
_mm_mask_store_ss
_mm_maskz_abs_epi64
_mm_maskz_cvtepu32_ps
_mm_maskz_cvtps_pd
_mm_maskz_cvtps_ph
_mm_maskz_load_sd
_mm_maskz_load_ss
_mm_maskz_min_epi64
_mm_min_epi64
_mm_mmask_i32gather_epi32
_mm_mmask_i32gather_epi64
_mm_mmask_i32gather_pd
_mm_mmask_i32gather_ps
_mm_mmask_i64gather_epi32
_mm_mmask_i64gather_epi64
_mm_mmask_i64gather_pd
_mm_mmask_i64gather_ps
_mm_rcp14_sd
_mm_rcp14_ss
_mm_rsqrt14_pd
_mm_rsqrt14_ps
_mm_rsqrt14_sd
_mm_rsqrt14_ss
_store_mask16_kand_mask16
Missing avx512bw intrinsics
_cvtmask32_u32
_cvtmask64_u64
_cvtu32_mask32
_cvtu64_mask64
_kadd_mask32
_kortest_mask32_u8
_kortest_mask64_u8
_kortestc_mask32_u8
_kortestc_mask64_u8
_kortestz_mask32_u8
_kortestz_mask64_u8
_kshiftli_mask32
_kshiftli_mask64
_kshiftri_mask32
_kshiftri_mask64
_ktest_mask32_u8
_ktest_mask64_u8
_ktestc_mask32_u8
_ktestc_mask64_u8
_ktestz_mask32_u8
_ktestz_mask64_u8
_mm256_cmp_epi16_mask
_mm256_cmp_epi8_mask
_mm256_cmp_epu16_mask
_mm256_cmp_epu8_mask
_mm256_cmpeq_epi16_mask
_mm256_cmpeq_epi8_mask
_mm256_cmpeq_epu16_mask
_mm256_cmpeq_epu8_mask
_mm256_cmpge_epi16_mask
_mm256_cmpge_epi8_mask
_mm256_cmpge_epu16_mask
_mm256_cmpge_epu8_mask
_mm256_cmpgt_epi16_mask
_mm256_cmpgt_epi8_mask
_mm256_cmpgt_epu16_mask
_mm256_cmpgt_epu8_mask
_mm256_cmple_epi16_mask
_mm256_cmple_epi8_mask
_mm256_cmple_epu16_mask
_mm256_cmple_epu8_mask
_mm256_cmplt_epi16_mask
_mm256_cmplt_epi8_mask
_mm256_cmplt_epu16_mask
_mm256_cmplt_epu8_mask
_mm256_cmpneq_epi16_mask
_mm256_cmpneq_epi8_mask
_mm256_cmpneq_epu16_mask
_mm256_cmpneq_epu8_mask
_mm256_cvtepi16_epi8
_mm256_cvtsepi16_epi8
_mm256_cvtusepi16_epi8
_mm256_dbsad_epu8
_mm256_loadu_epi16
_mm256_loadu_epi8
_mm256_mask2_permutex2var_epi16
_mm256_mask_abs_epi16
_mm256_mask_abs_epi8
_mm256_mask_add_epi16
_mm256_mask_add_epi8
_mm256_mask_adds_epi16
_mm256_mask_adds_epi8
_mm256_mask_adds_epu16
_mm256_mask_adds_epu8
_mm256_mask_alignr_epi8
_mm256_mask_avg_epu16
_mm256_mask_avg_epu8
_mm256_mask_blend_epi16
_mm256_mask_blend_epi8
_mm256_mask_broadcastb_epi8
_mm256_mask_broadcastw_epi16
_mm256_mask_cmp_epi16_mask
_mm256_mask_cmp_epi8_mask
_mm256_mask_cmp_epu16_mask
_mm256_mask_cmp_epu8_mask
_mm256_mask_cmpeq_epi16_mask
_mm256_mask_cmpeq_epi8_mask
_mm256_mask_cmpeq_epu16_mask
_mm256_mask_cmpeq_epu8_mask
_mm256_mask_cmpge_epi16_mask
_mm256_mask_cmpge_epi8_mask
_mm256_mask_cmpge_epu16_mask
_mm256_mask_cmpge_epu8_mask
_mm256_mask_cmpgt_epi16_mask
_mm256_mask_cmpgt_epi8_mask
_mm256_mask_cmpgt_epu16_mask
_mm256_mask_cmpgt_epu8_mask
_mm256_mask_cmple_epi16_mask
_mm256_mask_cmple_epi8_mask
_mm256_mask_cmple_epu16_mask
_mm256_mask_cmple_epu8_mask
_mm256_mask_cmplt_epi16_mask
_mm256_mask_cmplt_epi8_mask
_mm256_mask_cmplt_epu16_mask
_mm256_mask_cmplt_epu8_mask
_mm256_mask_cmpneq_epi16_mask
_mm256_mask_cmpneq_epi8_mask
_mm256_mask_cmpneq_epu16_mask
_mm256_mask_cmpneq_epu8_mask
_mm256_mask_cvtepi16_epi8
_mm256_mask_cvtepi16_storeu_epi8
_mm256_mask_cvtepi8_epi16
_mm256_mask_cvtepu8_epi16
_mm256_mask_cvtsepi16_epi8
_mm256_mask_cvtsepi16_storeu_epi8
_mm256_mask_cvtusepi16_epi8
_mm256_mask_cvtusepi16_storeu_epi8
_mm256_mask_dbsad_epu8
_mm256_mask_loadu_epi16
_mm256_mask_loadu_epi8
_mm256_mask_madd_epi16
_mm256_mask_maddubs_epi16
_mm256_mask_max_epi16
_mm256_mask_max_epi8
_mm256_mask_max_epu16
_mm256_mask_max_epu8
_mm256_mask_min_epi16
_mm256_mask_min_epi8
_mm256_mask_min_epu16
_mm256_mask_min_epu8
_mm256_mask_mov_epi16
_mm256_mask_mov_epi8
_mm256_mask_mulhi_epi16
_mm256_mask_mulhi_epu16
_mm256_mask_mulhrs_epi16
_mm256_mask_mullo_epi16
_mm256_mask_packs_epi16
_mm256_mask_packs_epi32
_mm256_mask_packus_epi16
_mm256_mask_packus_epi32
_mm256_mask_permutex2var_epi16
_mm256_mask_permutexvar_epi16
_mm256_mask_set1_epi16
_mm256_mask_set1_epi8
_mm256_mask_shuffle_epi8
_mm256_mask_shufflehi_epi16
_mm256_mask_shufflelo_epi16
_mm256_mask_sll_epi16
_mm256_mask_slli_epi16
_mm256_mask_sllv_epi16
_mm256_mask_sra_epi16
_mm256_mask_srai_epi16
_mm256_mask_srav_epi16
_mm256_mask_srl_epi16
_mm256_mask_srli_epi16
_mm256_mask_srlv_epi16
_mm256_mask_storeu_epi16
_mm256_mask_storeu_epi8
_mm256_mask_sub_epi16
_mm256_mask_sub_epi8
_mm256_mask_subs_epi16
_mm256_mask_subs_epi8
_mm256_mask_subs_epu16
_mm256_mask_subs_epu8
_mm256_mask_test_epi16_mask
_mm256_mask_test_epi8_mask
_mm256_mask_testn_epi16_mask
_mm256_mask_testn_epi8_mask
_mm256_mask_unpackhi_epi16
_mm256_mask_unpackhi_epi8
_mm256_mask_unpacklo_epi16
_mm256_mask_unpacklo_epi8
_mm256_maskz_abs_epi16
_mm256_maskz_abs_epi8
_mm256_maskz_add_epi16
_mm256_maskz_add_epi8
_mm256_maskz_adds_epi16
_mm256_maskz_adds_epi8
_mm256_maskz_adds_epu16
_mm256_maskz_adds_epu8
_mm256_maskz_alignr_epi8
_mm256_maskz_avg_epu16
_mm256_maskz_avg_epu8
_mm256_maskz_broadcastb_epi8
_mm256_maskz_broadcastw_epi16
_mm256_maskz_cvtepi16_epi8
_mm256_maskz_cvtepi8_epi16
_mm256_maskz_cvtepu8_epi16
_mm256_maskz_cvtsepi16_epi8
_mm256_maskz_cvtusepi16_epi8
_mm256_maskz_dbsad_epu8
_mm256_maskz_loadu_epi16
_mm256_maskz_loadu_epi8
_mm256_maskz_madd_epi16
_mm256_maskz_maddubs_epi16
_mm256_maskz_max_epi16
_mm256_maskz_max_epi8
_mm256_maskz_max_epu16
_mm256_maskz_max_epu8
_mm256_maskz_min_epi16
_mm256_maskz_min_epi8
_mm256_maskz_min_epu16
_mm256_maskz_min_epu8
_mm256_maskz_mov_epi16
_mm256_maskz_mov_epi8
_mm256_maskz_mulhi_epi16
_mm256_maskz_mulhi_epu16
_mm256_maskz_mulhrs_epi16
_mm256_maskz_mullo_epi16
_mm256_maskz_packs_epi16
_mm256_maskz_packs_epi32
_mm256_maskz_packus_epi16
_mm256_maskz_packus_epi32
_mm256_maskz_permutex2var_epi16
_mm256_maskz_permutexvar_epi16
_mm256_maskz_set1_epi16
_mm256_maskz_set1_epi8
_mm256_maskz_shuffle_epi8
_mm256_maskz_shufflehi_epi16
_mm256_maskz_shufflelo_epi16
_mm256_maskz_sll_epi16
_mm256_maskz_slli_epi16
_mm256_maskz_sllv_epi16
_mm256_maskz_sra_epi16
_mm256_maskz_srai_epi16
_mm256_maskz_srav_epi16
_mm256_maskz_srl_epi16
_mm256_maskz_srli_epi16
_mm256_maskz_srlv_epi16
_mm256_maskz_sub_epi16
_mm256_maskz_sub_epi8
_mm256_maskz_subs_epi16
_mm256_maskz_subs_epi8
_mm256_maskz_subs_epu16
_mm256_maskz_subs_epu8
_mm256_maskz_unpackhi_epi16
_mm256_maskz_unpackhi_epi8
_mm256_maskz_unpacklo_epi16
_mm256_maskz_unpacklo_epi8
_mm256_movepi16_mask
_mm256_movepi8_mask
_mm256_movm_epi16
_mm256_movm_epi8
_mm256_permutex2var_epi16
_mm256_permutexvar_epi16
_mm256_sllv_epi16
_mm256_srav_epi16
_mm256_srlv_epi16
_mm256_storeu_epi16
_mm256_storeu_epi8
_mm256_test_epi16_mask
_mm256_test_epi8_mask
_mm256_testn_epi16_mask
_mm256_testn_epi8_mask
_mm512_kunpackd
_mm512_kunpackw
_mm_cmp_epi16_mask
_mm_cmp_epi8_mask
_mm_cmp_epu16_mask
_mm_cmp_epu8_mask
_mm_cmpeq_epi16_mask
_mm_cmpeq_epi8_mask
_mm_cmpeq_epu16_mask
_mm_cmpeq_epu8_mask
_mm_cmpge_epi16_mask
_mm_cmpge_epi8_mask
_mm_cmpge_epu16_mask
_mm_cmpge_epu8_mask
_mm_cmpgt_epi16_mask
_mm_cmpgt_epi8_mask
_mm_cmpgt_epu16_mask
_mm_cmpgt_epu8_mask
_mm_cmple_epi16_mask
_mm_cmple_epi8_mask
_mm_cmple_epu16_mask
_mm_cmple_epu8_mask
_mm_cmplt_epi16_mask
_mm_cmplt_epi8_mask
_mm_cmplt_epu16_mask
_mm_cmplt_epu8_mask
_mm_cmpneq_epi16_mask
_mm_cmpneq_epi8_mask
_mm_cmpneq_epu16_mask
_mm_cmpneq_epu8_mask
_mm_cvtepi16_epi8
_mm_cvtsepi16_epi8
_mm_cvtusepi16_epi8
_mm_dbsad_epu8
_mm_loadu_epi16
_mm_loadu_epi8
_mm_mask2_permutex2var_epi16
_mm_mask_abs_epi16
_mm_mask_abs_epi8
_mm_mask_add_epi16
_mm_mask_add_epi8
_mm_mask_adds_epi16
_mm_mask_adds_epi8
_mm_mask_adds_epu16
_mm_mask_adds_epu8
_mm_mask_alignr_epi8
_mm_mask_avg_epu16
_mm_mask_avg_epu8
_mm_mask_blend_epi16
_mm_mask_blend_epi8
_mm_mask_broadcastb_epi8
_mm_mask_broadcastw_epi16
_mm_mask_cmp_epi16_mask
_mm_mask_cmp_epi8_mask
_mm_mask_cmp_epu16_mask
_mm_mask_cmp_epu8_mask
_mm_mask_cmpeq_epi16_mask
_mm_mask_cmpeq_epi8_mask
_mm_mask_cmpeq_epu16_mask
_mm_mask_cmpeq_epu8_mask
_mm_mask_cmpge_epi16_mask
_mm_mask_cmpge_epi8_mask
_mm_mask_cmpge_epu16_mask
_mm_mask_cmpge_epu8_mask
_mm_mask_cmpgt_epi16_mask
_mm_mask_cmpgt_epi8_mask
_mm_mask_cmpgt_epu16_mask
_mm_mask_cmpgt_epu8_mask
_mm_mask_cmple_epi16_mask
_mm_mask_cmple_epi8_mask
_mm_mask_cmple_epu16_mask
_mm_mask_cmple_epu8_mask
_mm_mask_cmplt_epi16_mask
_mm_mask_cmplt_epi8_mask
_mm_mask_cmplt_epu16_mask
_mm_mask_cmplt_epu8_mask
_mm_mask_cmpneq_epi16_mask
_mm_mask_cmpneq_epi8_mask
_mm_mask_cmpneq_epu16_mask
_mm_mask_cmpneq_epu8_mask
_mm_mask_cvtepi16_epi8
_mm_mask_cvtepi16_storeu_epi8
_mm_mask_cvtepi8_epi16
_mm_mask_cvtepu8_epi16
_mm_mask_cvtsepi16_epi8
_mm_mask_cvtsepi16_storeu_epi8
_mm_mask_cvtusepi16_epi8
_mm_mask_cvtusepi16_storeu_epi8
_mm_mask_dbsad_epu8
_mm_mask_loadu_epi16
_mm_mask_loadu_epi8
_mm_mask_madd_epi16
_mm_mask_maddubs_epi16
_mm_mask_max_epi16
_mm_mask_max_epi8
_mm_mask_max_epu16
_mm_mask_max_epu8
_mm_mask_min_epi16
_mm_mask_min_epi8
_mm_mask_min_epu16
_mm_mask_min_epu8
_mm_mask_mov_epi16
_mm_mask_mov_epi8
_mm_mask_mulhi_epi16
_mm_mask_mulhi_epu16
_mm_mask_mulhrs_epi16
_mm_mask_mullo_epi16
_mm_mask_packs_epi16
_mm_mask_packs_epi32
_mm_mask_packus_epi16
_mm_mask_packus_epi32
_mm_mask_permutex2var_epi16
_mm_mask_permutexvar_epi16
_mm_mask_set1_epi16
_mm_mask_set1_epi8
_mm_mask_shuffle_epi8
_mm_mask_shufflehi_epi16
_mm_mask_shufflelo_epi16
_mm_mask_sll_epi16
_mm_mask_slli_epi16
_mm_mask_sllv_epi16
_mm_mask_sra_epi16
_mm_mask_srai_epi16
_mm_mask_srav_epi16
_mm_mask_srl_epi16
_mm_mask_srli_epi16
_mm_mask_srlv_epi16
_mm_mask_storeu_epi16
_mm_mask_storeu_epi8
_mm_mask_sub_epi16
_mm_mask_sub_epi8
_mm_mask_subs_epi16
_mm_mask_subs_epi8
_mm_mask_subs_epu16
_mm_mask_subs_epu8
_mm_mask_test_epi16_mask
_mm_mask_test_epi8_mask
_mm_mask_testn_epi16_mask
_mm_mask_testn_epi8_mask
_mm_mask_unpackhi_epi16
_mm_mask_unpackhi_epi8
_mm_mask_unpacklo_epi16
_mm_mask_unpacklo_epi8
_mm_maskz_abs_epi16
_mm_maskz_abs_epi8
_mm_maskz_add_epi16
_mm_maskz_add_epi8
_mm_maskz_adds_epi16
_mm_maskz_adds_epi8
_mm_maskz_adds_epu16
_mm_maskz_adds_epu8
_mm_maskz_alignr_epi8
_mm_maskz_avg_epu16
_mm_maskz_avg_epu8
_mm_maskz_broadcastb_epi8
_mm_maskz_broadcastw_epi16
_mm_maskz_cvtepi16_epi8
_mm_maskz_cvtepi8_epi16
_mm_maskz_cvtepu8_epi16
_mm_maskz_cvtsepi16_epi8
_mm_maskz_cvtusepi16_epi8
_mm_maskz_dbsad_epu8
_mm_maskz_loadu_epi16
_mm_maskz_loadu_epi8
_mm_maskz_madd_epi16
_mm_maskz_maddubs_epi16
_mm_maskz_max_epi16
_mm_maskz_max_epi8
_mm_maskz_max_epu16
_mm_maskz_max_epu8
_mm_maskz_min_epi16
_mm_maskz_min_epi8
_mm_maskz_min_epu16
_mm_maskz_min_epu8
_mm_maskz_mov_epi16
_mm_maskz_mov_epi8
_mm_maskz_mulhi_epi16
_mm_maskz_mulhi_epu16
_mm_maskz_mulhrs_epi16
_mm_maskz_mullo_epi16
_mm_maskz_packs_epi16
_mm_maskz_packs_epi32
_mm_maskz_packus_epi16
_mm_maskz_packus_epi32
_mm_maskz_permutex2var_epi16
_mm_maskz_permutexvar_epi16
_mm_maskz_set1_epi16
_mm_maskz_set1_epi8
_mm_maskz_shuffle_epi8
_mm_maskz_shufflehi_epi16
_mm_maskz_shufflelo_epi16
_mm_maskz_sll_epi16
_mm_maskz_slli_epi16
_mm_maskz_sllv_epi16
_mm_maskz_sra_epi16
_mm_maskz_srai_epi16
_mm_maskz_srav_epi16
_mm_maskz_srl_epi16
_mm_maskz_srli_epi16
_mm_maskz_srlv_epi16
_mm_maskz_sub_epi16
_mm_maskz_sub_epi8
_mm_maskz_subs_epi16
_mm_maskz_subs_epi8
_mm_maskz_subs_epu16
_mm_maskz_subs_epu8
_mm_maskz_unpackhi_epi16
_mm_maskz_unpackhi_epi8
_mm_maskz_unpacklo_epi16
_mm_maskz_unpacklo_epi8
_mm_movepi16_mask
_mm_movepi8_mask
_mm_movm_epi16
_mm_movm_epi8
_mm_permutex2var_epi16
_mm_permutexvar_epi16
_mm_sllv_epi16
_mm_srav_epi16
_mm_srlv_epi16
_mm_storeu_epi16
_mm_storeu_epi8
_mm_test_epi16_mask
_mm_test_epi8_mask
_mm_testn_epi16_mask
_mm_testn_epi8_mask
_store_mask64
_store_mask64_kadd_mask32
Not mentioned avx512f intrinsics
_cvtmask16_u32
_cvtu32_mask16
_kortest_mask16_u8
_kortestc_mask16_u8
_kortestz_mask16_u8
_kshiftli_mask16
_kshiftri_mask16
_load_mask16
_mm256_and_epi32
_mm256_and_epi64
_mm256_andnot_epi32
_mm256_andnot_epi64
_mm256_cvtepu32_ps
_mm256_mask_cvtepu32_ps
_mm256_mask_cvtps_pd
_mm256_maskz_cvtepu32_ps
_mm256_maskz_cvtps_pd
_mm256_rsqrt14_pd
_mm256_rsqrt14_ps
_mm512_ceil_pd
_mm512_ceil_ps
_mm512_cvtsd_f64
_mm512_cvtss_f32
_mm512_floor_pd
_mm512_floor_ps
_mm512_mask_ceil_pd
_mm512_mask_ceil_ps
_mm512_mask_floor_pd
_mm512_mask_floor_ps
_mm_and_epi32
_mm_and_epi64
_mm_andnot_epi32
_mm_andnot_epi64
_mm_cvtepu32_ps
_mm_mask_cvtepu32_ps
_mm_mask_cvtps_pd
_mm_maskz_cvtepu32_ps
_mm_maskz_cvtps_pd
_mm_rsqrt14_pd
_mm_rsqrt14_ps
_store_mask16_kand_mask16

Not mentioned avx512bw intrinsics:

  • _store_mask64_kadd_mask32

@caelunshun
Copy link

It looks like we're also missing _mm512_fpclass_ps_mask and mm512_fpclass_pd_mask, which are in the AVX-512DQ extension.

@workingjubilee
Copy link
Member

The untracked features "avx512er" and "avx512pf" have been removed. You probably weren't using them. I'm only mentioning them here in case someone gets confused and wonders where they went and looks here. These were only implemented by Knight's Landing, so most AVX512-enabled CPUs didn't have them.

@sayantn
Copy link
Contributor

sayantn commented Jun 21, 2024

We really need to upgrade the intrinsics list. Intel has since removed all the extgather, logather etc intrinsics (so avx512f.rs is almost complete now), and added the new AMX family, VEX variants of AVX512, and some more instruction sets.

@IceTDrinker
Copy link

Who is in "charge" of that question on the rust project side ? It seem a lot of people have changes to the intrinsics lists to contribute but it does not seem like it was updated recently ?

@sayantn
Copy link
Contributor

sayantn commented Jun 26, 2024

I am working on a PR to update many aspects of stdarch, including the intrinsics list (rust-lang/stdarch#1594)

@IceTDrinker
Copy link

awesome 🙏

@RalfJung
Copy link
Member

RalfJung commented Jun 26, 2024 via email

@Amanieu
Copy link
Member Author

Amanieu commented Jul 2, 2024

We don't use ACPs for stdarch because we don't invent our own APIs and instead follow existing C APIs for arch-specific intrinsics.

@sayantn
Copy link
Contributor

sayantn commented Aug 7, 2024

I don't think anything except for avx512vp2intersect is remaining, which needs compiler support for i1. So are there any more blockers or anything left to do before stabilization of the implemented intrinsics?

@nikic
Copy link
Contributor

nikic commented Aug 7, 2024

I don't think anything except for avx512vp2intersect is remaining, which needs compiler support for i1. So are there any more blockers or anything left to do before stabilization of the implemented intrinsics?

At least for intrinsics that operate on 512-bit vectors, we'd have to sort out the evex512 situation first.

@IceTDrinker
Copy link

the current evex512 situation is it's added to any avx512 intrinsics IIRC right ? (notably this #121081 was fixed in #121088 but IIRC @nikic was not completely happy with the fix), what needs to be done re evex512 ? (and maybe AVX10) ?

@nikic
Copy link
Contributor

nikic commented Aug 19, 2024

@IceTDrinker If we want to support AVX10/256 in the future, we can't have the implicit avx512 -> evex512 implication and presumably need to explicitly annotate all avx512 intrinsics that use 512-bit vectors with the evex512 feature and anyone using them will also have to add that feature to their functions.

(Intel is the gift that keeps on giving.)

@IceTDrinker
Copy link

😵‍💫

@nikic no chance there is some metadata somewhere indicating whether an avx512 gated intrinsics uses ZMM registers ? so that nobody has to do that manually ? 😭

and yeah poisoned gift at this point

@IceTDrinker
Copy link

IceTDrinker commented Aug 19, 2024

very crudely it seems that an intel intrinsics uses at least one ZMM register if and only if the name starts with _mm512 👀

@AlJohri
Copy link

AlJohri commented Oct 3, 2024

In the spirit of @AlexanderSchuetz97's question above:

A dumb question, since this appears to be blocked on some cpu instructions not having a corresponding wrapper function due to downstream compilers not supporting them yet, why not stabilize it peacemeal? The instructions that are already implemented (provided that they do work as advertised) would already help me out a lot. I dont really see the need why all avx512 instruction wrappers need to be stabilized at the same time.

Is there a way to enable use of compile time target feature detection (i.e. #[cfg(target_feature = "avx512f")]) before the intrinsics get stabilized? We are currently writing our avx512f code in C to work around the limitation, but we don't have a way at compile time to switch to use the C version. Stabilizing the target feature detection earlier than the intrinsics could help with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC O-x86_32 Target: x86 processors, 32 bit (like i686-*) O-x86_64 Target: x86-64 processors (like x86_64-*) T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests