-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New f1600x4 #14
base: master
Are you sure you want to change the base?
New f1600x4 #14
Conversation
This also means it no longer works on Windows. |
shake256-avx2/f1600x4.s
Outdated
#endif /* !__APPLE */ | ||
movq $6, %rax | ||
17: # loop.begin: | ||
vmovdqa 0(%rdi), %ymm8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't compile with MSVC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a windows machine available to me. What is the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSVC has completely different syntax; there is not a chance to write assembly code that works for both Windows and gnu-style assemblers without going to a portable assembler like NASM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peachpy (which I used to generate the assembler, see f1600x4.py
) does support nasm as output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you not generate microsoft assembly as .asm
files? Then those could be used on Windows (with a nmake
Makefile
(see PQClean) or you could consider contributing an appropriate CMakeList.txt
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately peachpy doesn't support microsoft asm. It can directly generate the windows object file. It can also generate nasm. What would we prefer? @joostrijneveld
Instead of using intrinsics and full unrolling, this uses a four-round unrolled version adapted from the one I wrote for Cloudflare's CIRCL library: github.com/cloudflare/circl/simd/keccakf1600
keccak4x/KeccakP-1600-times4-SIMD256.o | ||
*.s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the file having .S
(capital letter) now this probably also needs to be updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wouldn't want to ignore the file itself, right? The .s
is generated form the .S
when compiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah you're not running peachpy
in Make of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Try to keep dependencies to compile to a minimum.
@thomwiggers Any suggestions how to advance this? (Given you are facing the same troubles with Kyber & Dilithium.) |
The Kyber/Dilithium AVX2 implementations are simply marked as not supported on Windows. It would still be a shame to drop Windows support from this otherwise mostly (bar VLAs) portable plain-C implementation. |
In downstream projects, files may be copied and moved around. This makes it difficult to track the copyrights and licenses without having them stored in each file. License and copyright holders for files without specific header is assumed to be the one in LICENSE and AUTHORS.md files respectively. There might be an issue in avx2/f1600x4.S. It is derived from the code at sphincs/sphincsplus#14 but there is no copyright information in that PR and the code is said to be generated by PeachPy but there is no source code associated.
In downstream projects, files may be copied and moved around. This makes it difficult to track the copyrights and licenses without having them stored in each file. License and copyright holders for files without specific header is assumed to be the one in LICENSE and AUTHORS.md files respectively. I haven't touched avx2/f1600x4.S because it is derived from the code at sphincs/sphincsplus#14 but there is no copyright information in that PR and the code is said to be generated by PeachPy but there is no source code associated.
Instead of using intrinsics and full unrolling, this uses a
four-round unrolled version adapted from the one I wrote for
Cloudflare's CIRCL library:
This is about 10-20% faster on my notebook. We'd better check whether that's also true for other systems as well before merging.