-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checksum: add FxHash and GxHash based checksum #58
Conversation
This is just to test out the performance compared to the xxhash we've been using until now. Early measurements with 4M blobs have shown that it could be worth experimenting with fxhash which is used in the rust compiler.
Due to stability concerns i've tested some of the hash impls with a bunch of different compiler versions on different CPUs and indeed ahash is quite unstable, but fxhash remained remarkably stable. Included in the tests are ants epyc7000 (CentOS Stream 8), epyc8000 (CentOS Stream 8), epyc9000 (Rocky 9), xeon gold 5220R (Ubuntu 20.04), a virtual CPU with some xeon skylake below (Fedora 39), a virtual CPU with a neoverse-n1 below (rocky 9) and the work station of yours truly i7 13000 (Fedora 39). They all produce the same hash for a given input under multiple stable and nightly. |
For a comprehensive overview refer to https://github.com/rurban/smhasher. |
Noticed this while grepping for XxHash, seemed to have evaded the cleaning process some generations ago.
Numbers from one of the EPYC 9000 servers (hashing a randomly filled 4MiB slice): GxHash:
Minimum: 45441 ns
50%: 46400 ns
95%: 47980 ns
Maximum: 53160 ns
ahash:
Minimum: 123610 ns
50%: 126410 ns
95%: 128561 ns
Maximum: 135941 ns
XxHash:
Minimum: 217851 ns
50%: 221981 ns
95%: 226151 ns
Maximum: 246701 ns
Highway:
Minimum: 320102 ns
50%: 324492 ns
95%: 327932 ns
Maximum: 344701 ns
FxHash:
Minimum: 708773 ns
50%: 710374 ns
95%: 711574 ns
Maximum: 727214 ns
ZwoHash:
Minimum: 708794 ns
50%: 714243 ns
95%: 717874 ns
Maximum: 728874 ns
Murmur3:
Minimum: 3144406 ns
50%: 3149066 ns
95%: 3153416 ns
Maximum: 3623948 ns |
This commit required modifying the build context to allow for the AES optimizations of GxHash. It should not prove to be an issue on the system we use (x86-64 and maybe ARM64) which I've tested before this commit.
The PR should be mostly finished now, I'll perform some tests and then merge it down. From the hashing implementations now in the stack the order in regards to performance seems to be: GxHash > XxHash ~ FxHash. The latter two really are system-dependent, I've had systems in the testbed in which XxHash outperformed FxHash by 2x and some in which FxHash is somewhat faster than XxHash. Although overall, GxHash seems to be No. 1 in each of these machines. |
The picture is similar for Highway. Which has a mixed performance overall depending on the system. |
Based on the benchmark results, and since there is no data stored beyond ephemeral development stuff in all existing Haura deployments, I took the liberty to assign gxhash as the default checksum. The version for gxhash is pinned to |
This is just to test out the performance compared to the xxhash we've been using until now. Early measurements with 4M blobs have shown that it could be worth experimenting with fxhash (https://lib.rs/crates/rustc-hash) which is used in the rust compiler.
Another one worth trying out is ahash (https://lib.rs/crates/ahash) used by hashbrown.Ahash is too unstable.