Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(trie): deduplicate already fetched prefetch targets #14223

Merged
merged 3 commits into from
Feb 5, 2025

Conversation

Rjected
Copy link
Member

@Rjected Rjected commented Feb 4, 2025

This prevents fetching proofs that have already been fetched by state updates or other proof fetches. Previously we would only extend self.fetched_proof_targets, without checking it or modifying for prefetch calls.

@Rjected Rjected force-pushed the dan/dedupe-prefetch-targets branch 3 times, most recently from 6dc3915 to 9c63e0c Compare February 4, 2025 23:29
@Rjected Rjected added C-perf A change motivated by improving speed, memory usage or disk footprint A-trie Related to Merkle Patricia Trie implementation labels Feb 5, 2025
@Rjected Rjected marked this pull request as ready for review February 5, 2025 00:01
targets.retain(|hashed_address, target_storage| {
self.fetched_proof_targets
.get(hashed_address)
.is_none_or(|fetched_storage| fetched_storage == target_storage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I reading this wrong, or it will leave in targets only either accounts that do not exist in self.fetched_proof_targets, or accounts with storages that are equal to self.fetched_proof_targets?

Think it should be

Suggested change
.is_none_or(|fetched_storage| fetched_storage == target_storage)
.is_none_or(|fetched_storage| fetched_storage != target_storage)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe rather it should be checking for target_storage being a subset of fetched_storage, and then removing it if so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think checking if it's a subset and removing makes more sense here

Comment on lines 613 to 615
// if both storages are empty, then we can skip this account altogether
if target_storage.is_empty() && fetched_storage.is_empty() {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsure about this

if the account exists in both targets and self.fetched_proof_targets , but has no associated storage slots in either, we should remove this account from targets, because it was already fetched

let prev_target_storage_len = target_storage.len();

// keep only the storage slots that have not been fetched yet
target_storage.retain(|slot| !fetched_storage.contains(slot));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may result in an empty target_storage, and then we will need to remove the entry from the targets map completely to prevent fetching the account proof

@shekhirin
Copy link
Collaborator

Left some comments about logic, I think basically it should be very similar to

/// Returns accounts only with those storages that were not already fetched, and
/// if there are no such storages and the account itself was already fetched, the
/// account shouldn't be included.
fn get_proof_targets(
state_update: &HashedPostState,
fetched_proof_targets: &MultiProofTargets,
) -> MultiProofTargets {
let mut targets = MultiProofTargets::default();
// first collect all new accounts (not previously fetched)
for &hashed_address in state_update.accounts.keys() {
if !fetched_proof_targets.contains_key(&hashed_address) {
targets.insert(hashed_address, HashSet::default());
}
}
// then process storage slots for all accounts in the state update
for (hashed_address, storage) in &state_update.storages {
let fetched = fetched_proof_targets.get(hashed_address);
let mut changed_slots = storage
.storage
.keys()
.filter(|slot| !fetched.is_some_and(|f| f.contains(*slot)))
.peekable();
if changed_slots.peek().is_some() {
targets.entry(*hashed_address).or_default().extend(changed_slots);
}
}
targets
}

@Rjected Rjected force-pushed the dan/dedupe-prefetch-targets branch 2 times, most recently from 5f32dac to abe946e Compare February 5, 2025 16:37
@Rjected Rjected requested a review from shekhirin February 5, 2025 16:39
Copy link
Collaborator

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, that looks correct, but would be nice to have some sanity tests

crates/engine/tree/src/tree/root.rs Outdated Show resolved Hide resolved
Rjected and others added 3 commits February 5, 2025 16:42
Co-authored-by: Alexey Shekhirin <5773434+shekhirin@users.noreply.github.com>
@Rjected Rjected force-pushed the dan/dedupe-prefetch-targets branch from 46c6ae3 to 68c3e89 Compare February 5, 2025 22:19
@Rjected Rjected enabled auto-merge February 5, 2025 22:25
@Rjected
Copy link
Member Author

Rjected commented Feb 5, 2025

added some sanity tests

@Rjected Rjected added this pull request to the merge queue Feb 5, 2025
Merged via the queue into main with commit 06132f5 Feb 5, 2025
45 checks passed
@Rjected Rjected deleted the dan/dedupe-prefetch-targets branch February 5, 2025 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trie Related to Merkle Patricia Trie implementation C-perf A change motivated by improving speed, memory usage or disk footprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants