Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
sz_checksum(char const *, size_t)
C 99 interfacesz::str().checksum()
C++ 11 interfacesz.checksum(str)
Python interfaceDatabase and other Systems Engineers, you can now use StringZilla to dynamically dispatch different check-sum kernels for AVX2 capable Haswell+ CPUs, AVX-512BW capable Ice Lake+ CPUs, and Arm NEON CPUs on mobile. In AVX-512, masked loads are used extensively, resulting in a 10% improvement even on typical English words, averaging 5 bytes in length and 20x performance improvement compared to the serial code for longer strings.
On the technical side, on x86, the kernels use the well-known
SAD(text, zeros)
idiom to accumulate absolute differences between individual bytes into 64-bit words. It also uses bidirectional traversal to saturate the core, capable of performing 2 loads per CPU cycle. Moreover, on large inputs, it switches to streaming loads, separately handling the head and the tail, similar to ourmemcpy
alternative, also outperforming LibC on AVX-512-capable machines 😎