|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "SIMDe 0.8.0 & 0.8.2 Released" |
| 4 | +date: 2024-05-02 00:00:00 -0700 |
| 5 | +tags: announcements release |
| 6 | +author: Michael R. Crusoe |
| 7 | +--- |
| 8 | + |
| 9 | +I’m pleased to announce the availability of the latest releases of [SIMD |
| 10 | +Everywhere](https://github.com/simd-everywhere/simde) (SIMDe), |
| 11 | +[version 0.8.0](https://github.com/simd-everywhere/simde/releases/tag/v0.8.0) and |
| 12 | +[version 0.8.2](https://github.com/simd-everywhere/simde/releases/tag/v0.8.2), |
| 13 | +representing another year of work by over 20 contributors since |
| 14 | +version 0.7.6. |
| 15 | + |
| 16 | +Request for help: SIMDe has only one maintainer ([@mr-c](https://github.com/mr-c))! |
| 17 | +Please inquire about assisting in new work, code review, and more. |
| 18 | + |
| 19 | +SIMDe is a permissively-licensed (MIT) header-only library which |
| 20 | +provides fast, portable implementations of |
| 21 | +[SIMD](https://en.wikipedia.org/wiki/SIMD) intrinsics for platforms |
| 22 | +which aren’t natively supported by the API in question. |
| 23 | + |
| 24 | +For example, with SIMDe you can use |
| 25 | +[SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions), SSE2, SSE3, |
| 26 | +SSE4.1 and 4.2, AVX, AVX2, and many AVX-512 intrinsics on |
| 27 | +[ARM](https://en.wikipedia.org/wiki/ARM_architecture), |
| 28 | +[POWER](https://en.wikipedia.org/wiki/IBM_POWER_instruction_set_architecture), |
| 29 | +[WebAssembly](https://webassembly.org/), or almost any platform with a |
| 30 | +C compiler. That includes, of course, x86 CPUs which don't support |
| 31 | +the ISA extension in question (*e.g.*, calling AVX-512F functions on a |
| 32 | +CPU which doesn't natively support them). |
| 33 | + |
| 34 | +If the target natively supports the SIMD extension in question there |
| 35 | +is no performance penalty for using SIMDe. Otherwise, accelerated |
| 36 | +implementations, such as NEON on ARM, AltiVec on POWER, WASM SIMD on |
| 37 | +WebAssembly, etc., are used when available to provide good |
| 38 | +performance. |
| 39 | + |
| 40 | +SIMDe is not just about implementing Intel/AMD intrinsics, it also has |
| 41 | +implementations for 99% of the ARM NEON intrinsics and in-progress support for |
| 42 | +others. |
| 43 | + |
| 44 | +SIMDe has already been used to port several packages to additional |
| 45 | +architectures through either upstream support or distribution |
| 46 | +packages, [particularly on Debian](https://wiki.debian.org/SIMDEverywhere). |
| 47 | + |
| 48 | +## What's new in 0.8.0 / 0.8.2 |
| 49 | + |
| 50 | +* 99% complete set of implementations for all NEON intrinsics have been finished, up from 56.46% in version 0.7.6! ([@yyctw](https://github.com/yyctw/) [@wewe5215](https://github.com/wewe5215) |
| 51 | +* Start of RISCV64 optimized implementation using the RVV1.0 vector extension! |
| 52 | +Thank you [@eric900115](https://github.com/eric900115) |
| 53 | +[@howjmay](https://github.com/howjmay) [@zengdage](https://github.com/zengdage). |
| 54 | +* SIMDe PRs are tested using Fedora Rawhide ([@junaruga](https://github.com/junaruga)) |
| 55 | + |
| 56 | +As always, we have an extensive test suite to verify our implementations. |
| 57 | + |
| 58 | +For a complete list of changes, check out the [0.8.0](https://github.com/simd-everywhere/simde/releases/tag/v0.8.0) |
| 59 | +and [0.8.2](https://github.com/simd-everywhere/simde/releases/tag/v0.8.2) release notes. |
| 60 | + |
| 61 | +Below are some additional highlights: |
| 62 | + |
| 63 | +### [X86](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md) |
| 64 | +There are a total of 6876 SIMD functions on x86, 2930 (43.17%) of which have been implemented in SIMDe so far. Specifically for AVX-512, of the 5160 functions currently in AVX-512, SIMDe implements 1510 (29.26%). |
| 65 | + |
| 66 | +Note: Intel has removed the intrinsics that were unique to Intel Xeon Phi (`ER`, `PF`, `4MAPS`, and `4VNNIW`) from their intrinsic list. SIMDe will retain those few implementations we already had, but this [changes how our completeness statistics are calculated](https://github.com/simd-everywhere/implementation-status/commit/f2e41cd88b41b299002b09d95e8fc7f761332926). |
| 67 | + |
| 68 | +#### Newly added function families |
| 69 | +* [AES](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#aes): 5 of 6 (83.33%) |
| 70 | +#### Newly AVX512 added function families |
| 71 | +* [castph](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#castph): 1 of 9 (11.11%) implemented. |
| 72 | +* [cvtus_storeu](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#cvtus_storeu): 1 of 18 (5.56%) implemented. |
| 73 | +* [fpclass](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#fpclass): 3 of 24 (12.50%) implemented. |
| 74 | +* [i32gather](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#i32gather): 1 of 8 (12.50%) implemented. |
| 75 | +* [i64gather](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#i64gather): 8 of 8 :100: |
| 76 | +* [permutex](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#permutex): 3 of 12 (25.00%) implemented. |
| 77 | +* [rcp14](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#rcp14): 1 of 24 (4.17%) implemented. |
| 78 | +reduce |
| 79 | +* [reduce_max](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#reduce_max): 7 of 31 (22.58%) implemented. |
| 80 | +* [reduce_min](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#reduce_min): 7 of 31 (22.58%) implemented. |
| 81 | +* [shufflehi](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#shufflehi): 1 of 7 (14.29%) implemented. |
| 82 | +* [shufflelo](https://github.com/simd-everywhere/implementation-status/blob/main/avx512.md#shufflelo): 1 of 7 (14.29%) implemented. |
| 83 | +#### Additions to existing families |
| 84 | +* [AVX512BW](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512bw): 7 additional, 337 of 790 (42.66%) |
| 85 | +* [AVX512DQ](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512dq): 5 additional, 112 total of 376 (29.79%) |
| 86 | +* [AVX512F](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512f): 48 additional, 1087 total of 2812 (38.66%) |
| 87 | +* [AVX512_FP16](https://github.com/simd-everywhere/implementation-status/blob/main/x86.md#avx512_fp16): 15 additional, 17 total of 1105 (1.54%) |
| 88 | +### [Neon](https://github.com/simd-everywhere/implementation-status/blob/main/neon.md) |
| 89 | + SIMDe currently implements 6608 out of 6670 (99.07%) NEON functions; up from 56.46% in the previous release! |
| 90 | +#### Newly added families |
| 91 | +* abal |
| 92 | +* abal_high |
| 93 | +* abd |
| 94 | +* abdh |
| 95 | +* abdl_high |
| 96 | +* addhn_high |
| 97 | +* aes |
| 98 | +* bfdot |
| 99 | +* bfdot_lane |
| 100 | +* cadd_rot |
| 101 | +* cale |
| 102 | +* calt |
| 103 | +* cmla_lane |
| 104 | +* cmla_rot_lane |
| 105 | +* copy_lane |
| 106 | +* cvt_high |
| 107 | +* cvt_n |
| 108 | +* cvta |
| 109 | +* cvtn |
| 110 | +* cvtp |
| 111 | +* cvtx |
| 112 | +* cvtx_high |
| 113 | +* div |
| 114 | +* dupb_lane |
| 115 | +* duph_lane |
| 116 | +* eor3 |
| 117 | +* fmlal |
| 118 | +* fms |
| 119 | +* fms_lane |
| 120 | +* fms_n |
| 121 | +* ld2_dup |
| 122 | +* ld2_lane |
| 123 | +* ld3_dup |
| 124 | +* ld3_lane |
| 125 | +* ld4_dup |
| 126 | +* maxnmv |
| 127 | +* minnmv |
| 128 | +* mla_lane |
| 129 | +* mla_high_lane |
| 130 | +* mls_lane |
| 131 | +* mlsl_high_lane |
| 132 | +* mmla |
| 133 | +* mull_high_lane |
| 134 | +* mull_high_n |
| 135 | +* mulx |
| 136 | +* mulx_lane |
| 137 | +* pmaxnm |
| 138 | +* pminnm |
| 139 | +* qdmlal |
| 140 | +* qdmlal_high |
| 141 | +* qdmlal_high_lane |
| 142 | +* qdmlal_high_n |
| 143 | +* qdmlal_lane |
| 144 | +* qdmlal_n |
| 145 | +* qdmlsl |
| 146 | +* qdmlsl_high |
| 147 | +* qdmlsl_high_lane |
| 148 | +* qdmlsl_high_n |
| 149 | +* qdmlsl_lane |
| 150 | +* qdmlsl_n |
| 151 | +* qdmlslh |
| 152 | +* qdmlslh_lane |
| 153 | +* qdmulhh |
| 154 | +* qdmulhh_lane |
| 155 | +* qdmull_high |
| 156 | +* qdmull_high_lane |
| 157 | +* qdmull_high_n |
| 158 | +* qdmull_lane |
| 159 | +* qdmull_n |
| 160 | +* qdmullh_lane |
| 161 | +* qmovun_high |
| 162 | +* qrdmlah |
| 163 | +* qrdmlah_lane |
| 164 | +* qrdmlahh |
| 165 | +* qrdmlahh_lane |
| 166 | +* qrdmlsh |
| 167 | +* qrdmlsh_lane |
| 168 | +* qrdmlshh |
| 169 | +* qrdmlshh_lane |
| 170 | +* qrdmulhh_lane |
| 171 | +* qrshl |
| 172 | +* qrshlh |
| 173 | +* qrshrn_high_n |
| 174 | +* qrshrnh_n |
| 175 | +* qrshrun_high_n |
| 176 | +* qrshrunh_n |
| 177 | +* qshl_n |
| 178 | +* qshlh_n |
| 179 | +* qshluh_n |
| 180 | +* qshrn_high_n |
| 181 | +* qshrnh_n |
| 182 | +* qshrun_high_n |
| 183 | +* qshrunh_n |
| 184 | +* raddhn |
| 185 | +* raddhn_high |
| 186 | +* rax |
| 187 | +* recp |
| 188 | +* rnd32x |
| 189 | +* rnd32x |
| 190 | +* rnd32x |
| 191 | +* rnd64z |
| 192 | +* rnda |
| 193 | +* rndx |
| 194 | +* rshrn_high_n |
| 195 | +* rsubhn |
| 196 | +* rsubhn |
| 197 | +* set_lane |
| 198 | +* sha1 |
| 199 | +* sha1h |
| 200 | +* sha256 |
| 201 | +* sha512 |
| 202 | +* shll_high_n |
| 203 | +* shrn_high_n |
| 204 | +* sli_n |
| 205 | +* sm3 |
| 206 | +* sm4 |
| 207 | +* sqrt |
| 208 | +* st1_x2 |
| 209 | +* st1_x3 |
| 210 | +* st1_x4 |
| 211 | +* st1q_x2 |
| 212 | +* st1q_x3 |
| 213 | +* st1q_x4 |
| 214 | +* subhn_high |
| 215 | +* sudot_lane |
| 216 | +* usdot |
| 217 | +* usdot_lane |
| 218 | + |
| 219 | +#### Finally complete families |
| 220 | +* cvtn |
| 221 | +* mla_lane |
| 222 | + |
| 223 | +## Getting Involved |
| 224 | + |
| 225 | +If you're interested in using SIMDe but need some specific functions |
| 226 | +to be implemented first, please [file an |
| 227 | +issue](https://github.com/simd-everywhere/simde/issues/new) and we may |
| 228 | +be able to prioritize those functions. |
| 229 | + |
| 230 | +If you're interested in helping out please get in touch. We have [a |
| 231 | +chat room on Matrix/Element](https://gitter.im/simd-everywhere/community) |
| 232 | +if you have questions, or of course you can just dive right in on [the issue |
| 233 | +tracker](https://github.com/simd-everywhere/simde/issues). |
0 commit comments