Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ serde_json = "1.0.133"
dyn-clone = "1.0.17"
rustc-hash = "2.1.0"
memchr = "2.7.4"
rayon = "1.11.0"

codspeed-criterion-compat = { version = "4.1.0", default-features = false, optional = true }
static_assertions = "1.1.0"
Expand All @@ -55,7 +56,6 @@ simd-json = "0.17.0"
twox-hash = "2.1.0"
regex = "1.11.1"
criterion = { version = "0.5.1", default-features = false }
rayon = "1.11.0"

[features]
codspeed = ["codspeed-criterion-compat"]
49 changes: 46 additions & 3 deletions src/concat_source.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ use std::{
sync::{Mutex, OnceLock},
};

use rustc_hash::FxHashMap as HashMap;
use rayon::prelude::*;
use rustc_hash::{FxHashMap as HashMap, FxHasher};

use crate::{
helpers::{get_map, Chunks, GeneratedInfo, StreamChunks},
Expand Down Expand Up @@ -227,9 +228,22 @@ impl Source for ConcatSource {
impl Hash for ConcatSource {
fn hash<H: Hasher>(&self, state: &mut H) {
"ConcatSource".hash(state);
for child in self.optimized_children().iter() {
child.hash(state);

let children = self.optimized_children();
let child_hashes: Vec<u64> = children
.par_iter()
.map(|child| {
let mut hasher = FxHasher::default();
child.hash(&mut hasher);
hasher.finish()
})
.collect();
Comment on lines +233 to +240

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using par_iter unconditionally can lead to performance degradation for ConcatSource with a small number of children due to the overhead of parallelization. This could be a performance regression in those cases.

Consider adding a threshold to switch between sequential and parallel hashing. For example, you could use iter() for a small number of children and par_iter() for a larger number.

To maintain hash consistency, the sequential path must use the same hashing logic as the parallel one (i.e., creating an FxHasher for each child and then combining the hashes). A possible implementation could look like this:

const PAR_HASH_THRESHOLD: usize = 16; // Should be benchmarked

let child_hashes: Vec<u64> = if children.len() < PAR_HASH_THRESHOLD {
    children
        .iter()
        .map(|child| {
            let mut hasher = FxHasher::default();
            child.hash(&mut hasher);
            hasher.finish()
        })
        .collect()
} else {
    children
        .par_iter()
        .map(|child| {
            let mut hasher = FxHasher::default();
            child.hash(&mut hasher);
            hasher.finish()
        })
        .collect()
};


let mut combined = FxHasher::default();
for child_hash in child_hashes {
child_hash.hash(&mut combined);
}
combined.finish().hash(state);
Comment on lines +242 to +246
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After computing child_hashes, the extra combined FxHasher pass adds another hashing round and a finalization before writing into state. You can hash the u64 values directly into state in order (after the parallel collection) to reduce work and simplify the implementation.

Suggested change
let mut combined = FxHasher::default();
for child_hash in child_hashes {
child_hash.hash(&mut combined);
}
combined.finish().hash(state);
for child_hash in child_hashes {
child_hash.hash(state);
}

Copilot uses AI. Check for mistakes.
}
}

Expand Down Expand Up @@ -489,6 +503,10 @@ fn merge_raw_sources(

#[cfg(test)]
mod tests {
use std::hash::Hash;

use rustc_hash::FxHasher;

use crate::{OriginalSource, RawBufferSource, RawStringSource};

use super::*;
Expand Down Expand Up @@ -865,4 +883,29 @@ mod tests {
]).boxed()"#
);
}

#[test]
fn test_hash_is_deterministic_for_many_children() {
let source = ConcatSource::new([
RawStringSource::from("0"),
RawStringSource::from("1"),
RawStringSource::from("2"),
RawStringSource::from("3"),
RawStringSource::from("4"),
RawStringSource::from("5"),
RawStringSource::from("6"),
RawStringSource::from("7"),
RawStringSource::from("8"),
]);

let mut hasher1 = FxHasher::default();
source.hash(&mut hasher1);
let hash1 = hasher1.finish();

let mut hasher2 = FxHasher::default();
source.hash(&mut hasher2);
let hash2 = hasher2.finish();

Comment on lines +901 to +908
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FxHasher::finish() is a method from the std::hash::Hasher trait, but Hasher isn’t imported in this test module. As written, the calls to hasher1.finish() / hasher2.finish() won’t compile unless std::hash::Hasher is brought into scope (or std::hash::Hasher::finish(&hasher) is used).

Copilot uses AI. Check for mistakes.
assert_eq!(hash1, hash2);
}
}
Loading