-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use VPERMB _mm512_permutexvar_epi8 for DES and/or Lotus #5706
Comments
Looks like this can provide great speedup for S-box lookups over bitslice: 22.125 VPTERNLOG's per 512 lookups vs. 1 VPERMB per 64, so kind of a ~2.76x speedup for this step. And either of these is times 8 for the 8 S-boxes. However, then things become difficult with DES
So this is probably a no-go primarily because of |
Looks like this won't help for DES, but it might for Lotus where we need the 8-bit outputs and don't need to expand them further, and where the S-box expressions are much longer (although we also need to implement #5451 at least for systems without such instruction). The 6-bit inputs will be rather limiting - we'd need 4 VPERMB per S-box lookup followed by logic to select the right outputs by high 2 bits, with the S-box content spread across 4 vectors. This extra logic may be a performance killer. The break-even point appears to be at 178/8 = ~22 instructions per S-box lookup, so the question is whether we can do it in fewer than that (probably yes). |
I was wrongly thinking of applying |
This is another recent instruction introduced in Intel Cannon Lake (9th gen, but not all) and above (consistently since Ice Lake, 10th gen) through the VBMI extension on top of AVX-512. It appears to perform a mapping that's just right for one DES S-box, 64 times in parallel. So a non-bitslice DES implementation using this instruction may outperform bitslice.
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_permutexvar_epi8&ig_expand=5071
The text was updated successfully, but these errors were encountered: