Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More CRC-32 and Adler-32 updates #353

Merged
merged 3 commits into from
Mar 16, 2024
Merged

More CRC-32 and Adler-32 updates #353

merged 3 commits into from
Mar 16, 2024

Conversation

ebiggers
Copy link
Owner

@ebiggers ebiggers commented Mar 10, 2024

  • test_checksums: increase number of long inputs tested
  • lib/{adler32,crc32}: misc cleanups
  • lib/x86/crc32: more optimizations

Various cleanups, including tweaks to make the Adler-32 code more
consistent with the CRC-32 code and vice versa.  No behavior changes.
- As was recently done in the Adler-32 code, take advantage of the fact
  that on recent x86 processors, vmovdqu with an aligned pointer is just
  as fast as vmovdqa.  Don't waste time aligning the pointer unless the
  length is very large, and at the same time, handle all cases of
  len >= 8*VL using the main loop so that the 4*VL wide loop isn't
  needed.  (Before, aligning the pointer was tied to whether the main
  loop was used or not, since the main loop used vmovdqa.)

- Handle short lengths more efficiently.  Instead of falling back to
  crc32_slice1() for all len < VL, use AVX-512 masking (when available)
  to handle 4 <= len <= 15, and use 128-bit vector instructions to
  handle 16 <= len < VL.

- Document why the main loop uses a width of 8*VL instead of 4*VL.
@ebiggers ebiggers merged commit 5d15bce into master Mar 16, 2024
52 checks passed
@ebiggers ebiggers deleted the dev branch March 16, 2024 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant