-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of Concept: VAES Support #144
base: master
Are you sure you want to change the base?
Conversation
All SMHasher tests passed with latest fix |
On EPYC 7773X, one can get the following results with this patch:
Without
|
current[2] = current[2].aesenc(blocks[2]); | ||
current[3] = current[3].aesenc(blocks[3]); | ||
sum[0] = sum[0].shuffle_and_add(blocks[0]); | ||
sum[1] = sum[1].shuffle_and_add(blocks[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps sum[0]
# Use VAES extension if possible. The hash value may be incompatible with NON-VAES targets | ||
vaes = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use cfg to detect the feature and don't need a feature declared here.
// Rust is confused with targets supporting VAES without AVX512 extensions. | ||
// We need to manually specify the underlying intrinsic; otherwise the compiler | ||
// will have trouble inlining the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a link to an issue on this?
target_feature = "avx512vaes", | ||
not(miri) | ||
))] | ||
if data.len() > 128 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding another 'if' I think a cleaner way to handle this would be to add a function to factor out a method in operations
. Like there could be aesenc_x4
which then uses cfg to provide one implementation or the other depending on if the cpu instruction is available.
Thanks for putting this together. |
@tkaitchuck Sorry for the long delay. I was held back by some other stuffs. I would like to push this forward. Any suggestion for what to do next? |
I will give this another try in a new PR. |
The idea is to add use VAES instruction to scan wider length each loop iteration.
(I also tested scan the same length per loop with less instruction, but it does not speed up at all).
We can gain 100% speed up.
Without
VAES
With
VAES
:Notice:
aeshash/wider-string
is same asaeshash/string
except that its lengths set has larger data point.vaes
passed all quality tests but its hash value may not be compatible with non-aes targets.