Skip to content

Commit e741a71

Browse files
committed
Announce 0.8.0 / 0.8.2 releases
1 parent 679aa1f commit e741a71

File tree

1 file changed

+233
-0
lines changed

1 file changed

+233
-0
lines changed
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
---
2+
layout: post
3+
title: "SIMDe 0.8.0 & 0.8.2 Released"
4+
date: 2024-05-02 00:00:00 -0700
5+
tags: announcements release
6+
author: Michael R. Crusoe
7+
---
8+
9+
I’m pleased to announce the availability of the latest releases of [SIMD
10+
Everywhere](https://github.com/simd-everywhere/simde) (SIMDe),
11+
[version 0.8.0](https://github.com/simd-everywhere/simde/releases/tag/v0.8.0) and
12+
[version 0.8.2](https://github.com/simd-everywhere/simde/releases/tag/v0.8.2),
13+
representing another year of work by over 20 contributors since
14+
version 0.7.6.
15+
16+
Request for help: SIMDe has only one maintainer ([@mr-c](https://github.com/mr-c))!
17+
Please inquire about assisting in new work, code review, and more.
18+
19+
SIMDe is a permissively-licensed (MIT) header-only library which
20+
provides fast, portable implementations of
21+
[SIMD](https://en.wikipedia.org/wiki/SIMD) intrinsics for platforms
22+
which aren’t natively supported by the API in question.
23+
24+
For example, with SIMDe you can use
25+
[SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions), SSE2, SSE3,
26+
SSE4.1 and 4.2, AVX, AVX2, and many AVX-512 intrinsics on
27+
[ARM](https://en.wikipedia.org/wiki/ARM_architecture),
28+
[POWER](https://en.wikipedia.org/wiki/IBM_POWER_instruction_set_architecture),
29+
[WebAssembly](https://webassembly.org/), or almost any platform with a
30+
C compiler. That includes, of course, x86 CPUs which don't support
31+
the ISA extension in question (*e.g.*, calling AVX-512F functions on a
32+
CPU which doesn't natively support them).
33+
34+
If the target natively supports the SIMD extension in question there
35+
is no performance penalty for using SIMDe. Otherwise, accelerated
36+
implementations, such as NEON on ARM, AltiVec on POWER, WASM SIMD on
37+
WebAssembly, etc., are used when available to provide good
38+
performance.
39+
40+
SIMDe is not just about implementing Intel/AMD intrinsics, it also has
41+
implementations for 99% of the ARM NEON intrinsics and in-progress support for
42+
others.
43+
44+
SIMDe has already been used to port several packages to additional
45+
architectures through either upstream support or distribution
46+
packages, [particularly on Debian](https://wiki.debian.org/SIMDEverywhere).
47+
48+
## What's new in 0.8.0 / 0.8.2
49+
50+
* 99% complete set of implementations for all NEON intrinsics have been finished, up from 56.46% in version 0.7.6! ([@yyctw](https://github.com/yyctw/) [@wewe5215](https://github.com/wewe5215)
51+
* Start of RISCV64 optimized implementation using the RVV1.0 vector extension!
52+
Thank you [@eric900115](https://github.com/eric900115)
53+
[@howjmay](https://github.com/howjmay) [@zengdage](https://github.com/zengdage).
54+
* SIMDe PRs are tested using Fedora Rawhide ([@junaruga](https://github.com/junaruga))
55+
56+
As always, we have an extensive test suite to verify our implementations.
57+
58+
For a complete list of changes, check out the [0.8.0](https://github.com/simd-everywhere/simde/releases/tag/v0.8.0)
59+
and [0.8.2](https://github.com/simd-everywhere/simde/releases/tag/v0.8.2) release notes.
60+
61+
Below are some additional highlights:
62+
63+
### [X86](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md)
64+
There are a total of 6876 SIMD functions on x86, 2930 (43.17%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5160 functions currently in AVX-512, SIMDe implements 1510 (29.26%).
65+
66+
Note: Intel has removed the intrinsics that were unique to Intel Xeon Phi (`ER`, `PF`, `4MAPS`, and `4VNNIW`) from their intrinsic list. SIMDe will retain those few implementations we already had, but this [changes how our completeness statistics are calculated](https://github.com/simd-everywhere/implementation-status/commit/f2e41cd88b41b299002b09d95e8fc7f761332926).
67+
68+
#### Newly added function families
69+
* [AES](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#aes): 5 of 6 (83.33%)
70+
#### Newly AVX512 added function families
71+
* [castph](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#castph): 1 of 9 (11.11%) implemented.
72+
* [cvtus_storeu](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#cvtus_storeu): 1 of 18 (5.56%) implemented.
73+
* [fpclass](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#fpclass): 3 of 24 (12.50%) implemented.
74+
* [i32gather](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#i32gather): 1 of 8 (12.50%) implemented.
75+
* [i64gather](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#i64gather): 8 of 8 :100:
76+
* [permutex](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#permutex): 3 of 12 (25.00%) implemented.
77+
* [rcp14](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#rcp14): 1 of 24 (4.17%) implemented.
78+
reduce
79+
* [reduce_max](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#reduce_max): 7 of 31 (22.58%) implemented.
80+
* [reduce_min](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#reduce_min): 7 of 31 (22.58%) implemented.
81+
* [shufflehi](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#shufflehi): 1 of 7 (14.29%) implemented.
82+
* [shufflelo](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#shufflelo): 1 of 7 (14.29%) implemented.
83+
#### Additions to existing families
84+
* [AVX512BW](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512bw): 7 additional, 337 of 790 (42.66%)
85+
* [AVX512DQ](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512dq): 5 additional, 112 total of 376 (29.79%)
86+
* [AVX512F](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512f): 48 additional, 1087 total of 2812 (38.66%)
87+
* [AVX512_FP16](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_fp16): 15 additional, 17 total of 1105 (1.54%)
88+
### [Neon](https://github.com/simd-everywhere/implementation-status/blob/main/neon.md)
89+
SIMDe currently implements 6608 out of 6670 (99.07%) NEON functions; up from 56.46% in the previous release!
90+
#### Newly added families
91+
* abal
92+
* abal_high
93+
* abd
94+
* abdh
95+
* abdl_high
96+
* addhn_high
97+
* aes
98+
* bfdot
99+
* bfdot_lane
100+
* cadd_rot
101+
* cale
102+
* calt
103+
* cmla_lane
104+
* cmla_rot_lane
105+
* copy_lane
106+
* cvt_high
107+
* cvt_n
108+
* cvta
109+
* cvtn
110+
* cvtp
111+
* cvtx
112+
* cvtx_high
113+
* div
114+
* dupb_lane
115+
* duph_lane
116+
* eor3
117+
* fmlal
118+
* fms
119+
* fms_lane
120+
* fms_n
121+
* ld2_dup
122+
* ld2_lane
123+
* ld3_dup
124+
* ld3_lane
125+
* ld4_dup
126+
* maxnmv
127+
* minnmv
128+
* mla_lane
129+
* mla_high_lane
130+
* mls_lane
131+
* mlsl_high_lane
132+
* mmla
133+
* mull_high_lane
134+
* mull_high_n
135+
* mulx
136+
* mulx_lane
137+
* pmaxnm
138+
* pminnm
139+
* qdmlal
140+
* qdmlal_high
141+
* qdmlal_high_lane
142+
* qdmlal_high_n
143+
* qdmlal_lane
144+
* qdmlal_n
145+
* qdmlsl
146+
* qdmlsl_high
147+
* qdmlsl_high_lane
148+
* qdmlsl_high_n
149+
* qdmlsl_lane
150+
* qdmlsl_n
151+
* qdmlslh
152+
* qdmlslh_lane
153+
* qdmulhh
154+
* qdmulhh_lane
155+
* qdmull_high
156+
* qdmull_high_lane
157+
* qdmull_high_n
158+
* qdmull_lane
159+
* qdmull_n
160+
* qdmullh_lane
161+
* qmovun_high
162+
* qrdmlah
163+
* qrdmlah_lane
164+
* qrdmlahh
165+
* qrdmlahh_lane
166+
* qrdmlsh
167+
* qrdmlsh_lane
168+
* qrdmlshh
169+
* qrdmlshh_lane
170+
* qrdmulhh_lane
171+
* qrshl
172+
* qrshlh
173+
* qrshrn_high_n
174+
* qrshrnh_n
175+
* qrshrun_high_n
176+
* qrshrunh_n
177+
* qshl_n
178+
* qshlh_n
179+
* qshluh_n
180+
* qshrn_high_n
181+
* qshrnh_n
182+
* qshrun_high_n
183+
* qshrunh_n
184+
* raddhn
185+
* raddhn_high
186+
* rax
187+
* recp
188+
* rnd32x
189+
* rnd32x
190+
* rnd32x
191+
* rnd64z
192+
* rnda
193+
* rndx
194+
* rshrn_high_n
195+
* rsubhn
196+
* rsubhn
197+
* set_lane
198+
* sha1
199+
* sha1h
200+
* sha256
201+
* sha512
202+
* shll_high_n
203+
* shrn_high_n
204+
* sli_n
205+
* sm3
206+
* sm4
207+
* sqrt
208+
* st1_x2
209+
* st1_x3
210+
* st1_x4
211+
* st1q_x2
212+
* st1q_x3
213+
* st1q_x4
214+
* subhn_high
215+
* sudot_lane
216+
* usdot
217+
* usdot_lane
218+
219+
#### Finally complete families
220+
* cvtn
221+
* mla_lane
222+
223+
## Getting Involved
224+
225+
If you're interested in using SIMDe but need some specific functions
226+
to be implemented first, please [file an
227+
issue](https://github.com/simd-everywhere/simde/issues/new) and we may
228+
be able to prioritize those functions.
229+
230+
If you're interested in helping out please get in touch. We have [a
231+
chat room on Matrix/Element](https://gitter.im/simd-everywhere/community)
232+
if you have questions, or of course you can just dive right in on [the issue
233+
tracker](https://github.com/simd-everywhere/simde/issues).

0 commit comments

Comments
 (0)