Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: store event bloom filter in DB #473

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

jbcaron
Copy link
Member

@jbcaron jbcaron commented Jan 23, 2025

Pull Request type

Please add the labels corresponding to the type of changes your PR introduces:

  • Feature

What is the current behavior?

Currently, the event pagination mechanism does not correctly handle the continuation token

Additionally, event filtering does not leverage a bloom filter, which could improve performance by reducing unnecessary block lookups.

Resolves: #NA

What is the new behavior?

  • Integration of mp-bloom-filter for event filtering:

    • Added the mp-bloom-filter crate to improve filtering efficiency.
    • Uses a bloom filter to quickly check if a block might contain relevant events before fetching it.
    • This optimization reduces the number of blocks read, improving query performance.
  • Optimized event retrieval:

    • Introduced take(chunk_size + 1) in the filtering process to determine whether the current block has more events.
    • Ensures efficient pagination by accurately detecting when a block is fully processed.

Does this introduce a breaking change?

Yes

  • The continuation token format has changed.
  • New DB column for bloom filter

Other information

@jbcaron jbcaron added feature Request for new feature or enhancement node Related to the full node implementation db-migration Requires database schema changes or migration labels Jan 23, 2025
@jbcaron jbcaron self-assigned this Jan 23, 2025
@jbcaron jbcaron requested a review from Trantorian1 February 3, 2025 08:28
@jbcaron jbcaron marked this pull request as ready for review February 3, 2025 08:28
@jbcaron jbcaron added the performance Performance improvements or optimizations label Feb 3, 2025
#[cfg(test)]
#[test]
fn test_column_all() {
assert_eq!(Column::ALL.len(), Column::NUM_COLUMNS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really useful since NUM_COLUMNS is already defined as Self::ALL.len();?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case of update

Comment on lines +166 to +182
fn bench_bloom_filters(c: &mut Criterion) {
let test_data = generate_test_data(TEST_DATA_SIZE);

// Test different hash functions
let hashers = [
get_hasher_benchmarks::<DefaultHasher>("DefaultHasher"),
get_hasher_benchmarks::<AHasher>("AHasher"),
get_hasher_benchmarks::<XxHash64>("XxHash64"),
];

for hasher in &hashers {
(hasher.sequential_insertion)(c, &test_data, hasher.name);
(hasher.parallel_insertion)(c, &test_data, hasher.name);
(hasher.lookup)(c, &test_data, hasher.name);
(hasher.parallel_lookup)(c, &test_data, hasher.name);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool testing structure!

/// where:
/// - k is the number of hash functions
/// - p is the desired false positive rate (0.01)
/// Reference: [Bloom, B. H. (1970). "Space/Time Trade-offs in Hash Coding with Allowable Errors"](https://dl.acm.org/doi/10.1145/362686.362692)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines +8 to +14
pub const HASH_COUNT: u8 = 7;

// Number of bits per element in the Bloom filter.
// The value 9.6 is optimal for a false positive rate of 1% (0.01).
const BITS_PER_ELEMENT: f64 = 9.6;

pub const FALSE_POSITIF_RATE: f64 = 0.01;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants appear several times in the codebase, maybe we want to define them only once and re-use them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
db-migration Requires database schema changes or migration feature Request for new feature or enhancement node Related to the full node implementation performance Performance improvements or optimizations
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

2 participants