-
Notifications
You must be signed in to change notification settings - Fork 9
perf: parallelize ConcatSource hashing #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @SyMind, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on optimizing the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request parallelizes the hashing of ConcatSource children to improve performance, using the rayon crate. The implementation computes hashes of children in parallel and then combines them. While this is a good performance optimization for sources with many children, I've pointed out a potential performance regression for sources with few children due to the overhead of parallelization. I've suggested introducing a threshold to switch between sequential and parallel hashing to address this.
| let child_hashes: Vec<u64> = children | ||
| .par_iter() | ||
| .map(|child| { | ||
| let mut hasher = FxHasher::default(); | ||
| child.hash(&mut hasher); | ||
| hasher.finish() | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using par_iter unconditionally can lead to performance degradation for ConcatSource with a small number of children due to the overhead of parallelization. This could be a performance regression in those cases.
Consider adding a threshold to switch between sequential and parallel hashing. For example, you could use iter() for a small number of children and par_iter() for a larger number.
To maintain hash consistency, the sequential path must use the same hashing logic as the parallel one (i.e., creating an FxHasher for each child and then combining the hashes). A possible implementation could look like this:
const PAR_HASH_THRESHOLD: usize = 16; // Should be benchmarked
let child_hashes: Vec<u64> = if children.len() < PAR_HASH_THRESHOLD {
children
.iter()
.map(|child| {
let mut hasher = FxHasher::default();
child.hash(&mut hasher);
hasher.finish()
})
.collect()
} else {
children
.par_iter()
.map(|child| {
let mut hasher = FxHasher::default();
child.hash(&mut hasher);
hasher.finish()
})
.collect()
};
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR parallelizes ConcatSource’s Hash implementation by hashing each optimized child source concurrently and then combining those hashes, requiring rayon as a normal dependency.
Changes:
- Add
rayonand usepar_iter()to compute per-child hashes in parallel forConcatSource. - Combine per-child hashes into a single value that is fed into the caller-provided
Hasher. - Add a new unit test intended to assert hash determinism for many children.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/concat_source.rs |
Parallelizes ConcatSource hashing via Rayon and adds a determinism-oriented unit test. |
Cargo.toml |
Moves/adds rayon under [dependencies] to support use in library code. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut hasher1 = FxHasher::default(); | ||
| source.hash(&mut hasher1); | ||
| let hash1 = hasher1.finish(); | ||
|
|
||
| let mut hasher2 = FxHasher::default(); | ||
| source.hash(&mut hasher2); | ||
| let hash2 = hasher2.finish(); | ||
|
|
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FxHasher::finish() is a method from the std::hash::Hasher trait, but Hasher isn’t imported in this test module. As written, the calls to hasher1.finish() / hasher2.finish() won’t compile unless std::hash::Hasher is brought into scope (or std::hash::Hasher::finish(&hasher) is used).
| let mut combined = FxHasher::default(); | ||
| for child_hash in child_hashes { | ||
| child_hash.hash(&mut combined); | ||
| } | ||
| combined.finish().hash(state); |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After computing child_hashes, the extra combined FxHasher pass adds another hashing round and a finalization before writing into state. You can hash the u64 values directly into state in order (after the parallel collection) to reduce work and simplify the implementation.
| let mut combined = FxHasher::default(); | |
| for child_hash in child_hashes { | |
| child_hash.hash(&mut combined); | |
| } | |
| combined.finish().hash(state); | |
| for child_hash in child_hashes { | |
| child_hash.hash(state); | |
| } |
No description provided.