-
Notifications
You must be signed in to change notification settings - Fork 9
perf: parallelize ConcatSource hashing #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -5,7 +5,8 @@ use std::{ | |||||||||||||||||
| sync::{Mutex, OnceLock}, | ||||||||||||||||||
| }; | ||||||||||||||||||
|
|
||||||||||||||||||
| use rustc_hash::FxHashMap as HashMap; | ||||||||||||||||||
| use rayon::prelude::*; | ||||||||||||||||||
| use rustc_hash::{FxHashMap as HashMap, FxHasher}; | ||||||||||||||||||
|
|
||||||||||||||||||
| use crate::{ | ||||||||||||||||||
| helpers::{get_map, Chunks, GeneratedInfo, StreamChunks}, | ||||||||||||||||||
|
|
@@ -227,9 +228,22 @@ impl Source for ConcatSource { | |||||||||||||||||
| impl Hash for ConcatSource { | ||||||||||||||||||
| fn hash<H: Hasher>(&self, state: &mut H) { | ||||||||||||||||||
| "ConcatSource".hash(state); | ||||||||||||||||||
| for child in self.optimized_children().iter() { | ||||||||||||||||||
| child.hash(state); | ||||||||||||||||||
|
|
||||||||||||||||||
| let children = self.optimized_children(); | ||||||||||||||||||
| let child_hashes: Vec<u64> = children | ||||||||||||||||||
| .par_iter() | ||||||||||||||||||
| .map(|child| { | ||||||||||||||||||
| let mut hasher = FxHasher::default(); | ||||||||||||||||||
| child.hash(&mut hasher); | ||||||||||||||||||
| hasher.finish() | ||||||||||||||||||
| }) | ||||||||||||||||||
| .collect(); | ||||||||||||||||||
|
|
||||||||||||||||||
| let mut combined = FxHasher::default(); | ||||||||||||||||||
| for child_hash in child_hashes { | ||||||||||||||||||
| child_hash.hash(&mut combined); | ||||||||||||||||||
| } | ||||||||||||||||||
| combined.finish().hash(state); | ||||||||||||||||||
|
Comment on lines
+242
to
+246
|
||||||||||||||||||
| let mut combined = FxHasher::default(); | |
| for child_hash in child_hashes { | |
| child_hash.hash(&mut combined); | |
| } | |
| combined.finish().hash(state); | |
| for child_hash in child_hashes { | |
| child_hash.hash(state); | |
| } |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FxHasher::finish() is a method from the std::hash::Hasher trait, but Hasher isn’t imported in this test module. As written, the calls to hasher1.finish() / hasher2.finish() won’t compile unless std::hash::Hasher is brought into scope (or std::hash::Hasher::finish(&hasher) is used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
par_iterunconditionally can lead to performance degradation forConcatSourcewith a small number of children due to the overhead of parallelization. This could be a performance regression in those cases.Consider adding a threshold to switch between sequential and parallel hashing. For example, you could use
iter()for a small number of children andpar_iter()for a larger number.To maintain hash consistency, the sequential path must use the same hashing logic as the parallel one (i.e., creating an
FxHasherfor each child and then combining the hashes). A possible implementation could look like this: