Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: big sum trees and count trees #348

Merged
merged 33 commits into from
Jan 16, 2025
Merged

feat: big sum trees and count trees #348

merged 33 commits into from
Jan 16, 2025

Conversation

QuantumExplorer
Copy link
Member

@QuantumExplorer QuantumExplorer commented Jan 8, 2025

Issue being fixed or feature implemented

What was done?

How Has This Been Tested?

Breaking Changes

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation

For repository code-owners and collaborators only

  • I have assigned this pull request to a milestone

Summary by CodeRabbit

Release Notes: GroveDB v2.0.0

New Features

  • Added support for Big Sum Trees and Count Trees
  • Introduced more flexible tree type handling with TreeType enum
  • Enhanced versioning system with more granular version tracking
  • Improved cost estimation for different tree types

Major Changes

  • Replaced boolean tree type flags with TreeType enum
  • Introduced AggregateData to handle more complex data aggregation
  • Added new element types: BigSumTree and CountTree
  • Updated method signatures across libraries to use TreeType
  • Enhanced error handling for tree operations

Performance Improvements

  • More precise cost calculations for different tree structures
  • Streamlined tree type handling
  • Improved versioning and propagation logic

Breaking Changes

  • Replaced sum() methods with aggregate_data()
  • Modified method signatures to use TreeType
  • Changed return types for root hash and sum calculations
  • Updated error handling for tree operations

Estimated Costs

  • Added estimated costs feature to default configuration
  • Introduced more detailed cost tracking for different tree types

Compatibility

  • Maintained core GroveDB functionality
  • Improved type safety and extensibility

Upgrade Recommendations

  • Update method calls to use new TreeType enum
  • Replace sum() method calls with aggregate_data()
  • Review error handling for tree operations

Copy link
Contributor

coderabbitai bot commented Jan 8, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This pull request introduces a comprehensive enhancement to the GroveDB system's versioning and type handling, focusing on expanding the capabilities for managing different tree types and aggregate data. The changes span multiple modules and introduce new structures like TreeType, NodeType, and AggregateData, which replace previous boolean-based tree type representations. The modifications enable support for big sum trees, count trees, and more nuanced handling of tree-related operations across the database's core components.

Changes

File Change Summary
grovedb-version/src/version/... Added new versioning structures for Merk versions and average case costs
grovedb/Cargo.toml Added estimated_costs feature to default features
grovedb/src/element/... Introduced new element types like BigSumTree and CountTree
merk/src/merk/mod.rs Added TreeType and NodeType enums to replace boolean tree type flags
Multiple files Replaced is_sum_tree boolean with TreeType in method signatures

Sequence Diagram

sequenceDiagram
    participant Client
    participant GroveDB
    participant Merk
    participant TreeNode

    Client->>GroveDB: Open database
    GroveDB->>Merk: Create with TreeType
    Merk->>TreeNode: Initialize node structure
    Client->>GroveDB: Perform operation
    GroveDB->>Merk: Process with new aggregate data handling
    Merk-->>Client: Return result with enhanced type information
Loading

Possibly related PRs

Suggested reviewers

  • fominok

Poem

🐰 A Rabbit's Ode to Tree Types

In databases deep and wide,
Where sum and count and trees reside,
We've crafted types with utmost care,
No more boolean flags to bear!

Big sums and counts now dance with glee,
Embracing type complexity! 🌳✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@QuantumExplorer QuantumExplorer changed the base branch from master to develop January 8, 2025 02:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 11

🧹 Nitpick comments (35)
grovedb/src/element/delete.rs (3)

35-43: Consider Refactoring 'op' Determination Logic

The logic for determining the op variable based on in_tree_type and is_layered is repeated across multiple methods. Refactoring this logic into a helper function or method would reduce code duplication and enhance maintainability.


92-100: Refactor Repeated 'op' Determination Logic

Similar to the earlier suggestion, consider refactoring the repeated logic for determining op into a shared helper function to improve code maintainability across methods.


Line range hint 132-151: Consolidate 'op' Determination Logic Across Methods

The op determination logic is duplicated in delete_into_batch_operations. Refactoring this logic into a common helper function will enhance code reuse and reduce potential errors.

merk/src/tree/tree_feature_type.rs (1)

220-227: Improve Error Reporting in 'Decode' Method

When decoding, the use of ed::Error::UnexpectedByte(55) for unknown feature types provides limited information. Consider using a more descriptive error message or an error variant that includes the unexpected byte value to aid in debugging.

grovedb/src/element/get.rs (1)

121-136: Maintainability: Dynamically derive known versions in error handling

The known_versions vector is hard-coded as vec![0, 1]. To make the code future-proof, consider deriving this list dynamically from the implemented versions to reduce maintenance efforts when adding new versions.

Example adjustment:

// Define a constant or function that provides the list of known versions
const KNOWN_VERSIONS: &[u32] = &[0, 1];

// ...

known_versions: KNOWN_VERSIONS.to_vec(),
grovedb/src/element/helpers.rs (1)

345-345: Idiomatic Improvement: Use &[u8] instead of &Vec<u8> in function signatures

In the specialized_costs_for_key_value function, accepting key: &[u8] is more flexible and idiomatic than &Vec<u8>. This change allows the function to accept any slice of bytes without requiring a specific vector type.

Apply this diff:

-    pub fn specialized_costs_for_key_value(
-        key: &Vec<u8>,
+    pub fn specialized_costs_for_key_value(
+        key: &[u8],
         value: &[u8],
         node_type: NodeType,
         grove_version: &GroveVersion,
     ) -> Result<u32, Error> {
grovedb/src/replication.rs (4)

29-35: Use a structured logging mechanism instead of eprintln!

The use of eprintln! is suitable for debugging but not optimal for error logging in production code. Consider using a structured logging framework or the existing logging mechanism in the codebase to handle errors more effectively and consistently.


55-57: Address the TODO: Optimize CHUNK_ID by compacting into a bitset

The TODO comment suggests compacting CHUNK_ID into a bitset for size optimization. Implementing this enhancement would improve performance and reduce the size of traversal instructions for deep subtrees.

Do you want me to help implement this optimization or open a GitHub issue to track this task?


88-135: Refactor duplicated code in fetch_chunk method

The code handling transactional and non-transactional cases in the fetch_chunk method is nearly identical, leading to code duplication. Consider refactoring the common logic into a separate function or closure to improve maintainability and reduce redundancy.

Also applies to: 137-180


319-319: Define constants for magic numbers to enhance readability

Magic numbers like 32 for chunk_prefix_length, 1 for root_key_size_length, and 1 for is_sum_tree_length are used directly in the code. Defining these values as named constants or using existing constants can improve code readability and maintainability.

Also applies to: 334-334, 346-346, 356-356

grovedb/src/element/insert.rs (1)

400-401: Handle potential errors when retrieving feature type

The use of cost_return_on_error_no_add!(&cost, self.get_feature_type(merk.tree_type)) assumes that get_feature_type will always succeed. If there's a possibility of it returning an error, consider handling it appropriately to prevent potential runtime issues.

Apply this diff to propagate the error using the existing cost:

 let merk_feature_type =
-    cost_return_on_error_no_add!(&cost, self.get_feature_type(merk.tree_type));
+    cost_return_on_error_no_add!(
+        &cost,
+        self.get_feature_type(merk.tree_type)
+    );
grovedb/src/replication/state_sync_session.rs (1)

362-364: Propagate the original error from apply_inner_chunk

Currently, when apply_inner_chunk returns an error, it is replaced with a generic InternalError message. Propagating the original error will provide more context and aid in debugging.

Apply this diff to propagate the original error:

-let Ok(res) = subtree_state_sync.apply_inner_chunk(&chunk_id, chunk, grove_version) else {
-    return Err(Error::InternalError("Invalid incoming prefix".to_string()));
-};
+let res = subtree_state_sync.apply_inner_chunk(&chunk_id, chunk, grove_version)?;
merk/src/tree/link.rs (3)

333-349: Handle Link::Modified variant without panicking

In the encode_into method, the code panics when encountering the Link::Modified variant. Panicking can be unsafe in production code. Consider returning a proper error instead of panicking to make the code more robust.

Update the match arm to return an error:

-    Link::Modified { .. } => panic!("No encoding for Link::Modified"),
+    Link::Modified { .. } => return Err(ed::Error::EncodingError("Cannot encode Link::Modified variant")),

506-520: Improve error handling in decode_into method

When an unexpected aggregate_data_byte is encountered, it's better to include the actual byte value in the error message. This will aid in debugging and provide clearer information about the error.

Modify the error to include the unexpected byte:

-    _ => return Err(ed::Error::UnexpectedByte(55)),
+    _ => return Err(ed::Error::UnexpectedByte(aggregate_data_byte)),

Line range hint 275-316: Avoid code duplication in encoding_cost and encoding_length methods

The logic for calculating encoding sizes in the encoding_cost and encoding_length methods is duplicated. Consider extracting this logic into a helper function to reduce duplication and potential inconsistencies.

Create a helper function calculate_encoding_length and use it in both methods to ensure consistency.

grovedb/src/estimated_costs/average_case_costs.rs (2)

112-112: Remove unused parameter _replacing_tree_type

The parameter _replacing_tree_type in average_case_merk_replace_tree_v0 is prefixed with an underscore to indicate it's unused. Since this parameter is not used in the function, consider removing it entirely to clean up the code.


Line range hint 186-208: Clarify parameter roles in average_case_merk_insert_tree

The function now accepts both tree_type and in_parent_tree_type parameters. Ensure that their roles are clearly documented and that they are used consistently throughout the function to avoid confusion.

Consider adding documentation comments explaining the purpose of each parameter.

grovedb/src/batch/estimated_costs/average_case_costs.rs (1)

142-142: Update documentation for AverageCaseTreeCacheKnownPaths

The cached_merks field now stores TreeType instead of IsSumTree. Update the struct's documentation to reflect this change for clarity.

merk/src/estimated_costs/average_case_costs.rs (1)

Line range hint 247-253: Clarify the Relationship Between TreeType and NodeType

Both TreeType and NodeType enums have similar variants and are introduced in close proximity. To enhance maintainability and readability, consider adding documentation that explains the distinction and relationship between these two enums. This will help future developers understand their specific purposes and how they interact within the codebase.

merk/src/tree/mod.rs (1)

97-101: Add Documentation for value_as_owned_with_feature Method

The new method value_as_owned_with_feature is a valuable addition that returns both the value and its associated feature type. Including a documentation comment for this method would improve code readability and assist other developers in understanding its purpose and usage.

merk/src/merk/mod.rs (1)

247-253: Clarify the Distinction Between TreeType and NodeType

The introduction of both TreeType and NodeType enums with overlapping variants may cause confusion. Adding clear documentation to explain their individual roles and how they differ would enhance code clarity and assist in future maintenance.

grovedb/src/lib.rs (1)

244-262: Improve code readability by using structs instead of type aliases

The introduction of type aliases VerificationIssues and OpenedMerkForReplication enhances readability but may not fully convey the meaning of the data structures. Consider defining structs for these complex types to provide clearer semantics and facilitate future extensions.

Suggested change: Define structs for complex types

Define VerificationIssues and OpenedMerkForReplication as structs:

pub struct VerificationIssues {
    pub issues: HashMap<Vec<Vec<u8>>, (CryptoHash, CryptoHash, CryptoHash)>,
}

pub struct OpenedMerkForReplication<'tx> {
    pub merk: Merk<PrefixedRocksDbImmediateStorageContext<'tx>>,
    pub root_key: Option<Vec<u8>>,
    pub tree_type: TreeType,
}
grovedb-version/src/version/merk_versions.rs (1)

8-11: Consider documenting version compatibility.

While the implementation looks good, it would be helpful to add documentation explaining:

  • The purpose of add_average_case_merk_propagate
  • Version compatibility considerations
  • Migration path for existing data
grovedb-version/src/version/mod.rs (1)

20-24: Verify error message clarity.

The error message "expected to have a platform version" could be more specific about what went wrong.

Consider:

-            .expect("expected to have a platform version")
+            .expect("GROVE_VERSIONS array must not be empty")
merk/src/merk/source.rs (1)

25-26: Consider adding type documentation.

While the transition to TreeType is good, it would be helpful to document the implications of different tree types in this context.

Also applies to: 32-34

merk/src/lib.rs (1)

Line range hint 1-87: Documentation needs for breaking changes.

This PR introduces several breaking changes and new concepts. Consider:

  1. Adding migration guides for users upgrading from previous versions
  2. Documenting the new tree types and their use cases
  3. Updating any relevant API documentation
grovedb/src/operations/get/worst_case.rs (1)

24-24: LGTM! Consider adding documentation for the TreeType parameter.

The change from boolean flags to TreeType enum improves type safety and code clarity. However, the function documentation could be enhanced to describe the valid tree types and their implications.

Add parameter documentation like this:

 /// Worst case cost for has raw
+///
+/// # Parameters
+///
+/// * `in_parent_tree_type` - The type of the parent tree (e.g., NormalTree, SumTree, BigSumTree)

Also applies to: 41-41, 52-52, 69-69, 81-81, 98-100

merk/src/error.rs (1)

120-125: LGTM! Consider adding examples to error documentation.

The new error variants improve error handling for tree type-related issues. Adding examples would help users understand when these errors occur.

Add examples like this:

 /// Big sum tree under normal sum tree error
+///
+/// # Example
+///
+/// This error occurs when attempting to create a big sum tree as a child of a normal sum tree,
+/// which is not allowed in the tree hierarchy.
 #[error("big sum tree under normal sum tree error {0}")]
 BigSumTreeUnderNormalSumTree(String),

 /// Unknown tree type error
+///
+/// # Example
+///
+/// This error occurs when encountering an unrecognized tree type value,
+/// possibly due to data corruption or version mismatch.
 #[error("unknown tree type {0}")]
 UnknownTreeType(String),
grovedb/src/element/constructor.rs (1)

138-140: Consider improving maintainability and documentation.

Suggestions for improvement:

  1. Extract the default value (0) into a constant to avoid magic numbers
  2. Add documentation explaining:
    • The difference between big sum trees and regular sum trees
    • When to use each type
    • The significance of the default value

Also applies to: 144-149, 153-159, 163-165, 169-174, 178-184

grovedb-version/src/version/v1.rs (1)

187-191: Add documentation for the versioning scheme.

While the addition of MerkAverageCaseCostsVersions is good for feature versioning, please add documentation explaining:

  • The purpose of the versioning scheme
  • What changes in behavior can be expected between versions
  • Why the initial version is set to 0
merk/src/estimated_costs/mod.rs (1)

30-42: Add documentation explaining the cost values.

Please add documentation explaining:

  1. Why BIG_SUM_VALUE_EXTRA_COST is set to 16 (compared to SUM_VALUE_EXTRA_COST = 9)
  2. The rationale behind these specific cost values
grovedb-version/src/version/v2.rs (1)

147-147: Document version increments.

Two operations have their versions incremented to 1:

  1. average_case_merk_replace_tree
  2. add_average_case_merk_propagate

Please add comments explaining the rationale for these specific version increments.

Also applies to: 189-189

merk/src/test_utils/mod.rs (1)

43-43: LGTM! Consider adding documentation for the TreeType enum.

The replacement of boolean flags with the TreeType enum improves type safety and extensibility. This change makes the code more maintainable by providing a clear enumeration of possible tree types.

Consider adding documentation to explain the different tree types and their use cases, especially for the new TreeType::NormalTree. This would help developers understand when to use each type.

Also applies to: 142-142, 324-324, 344-344

grovedb/src/visualize.rs (1)

104-119: Consider reducing code duplication in visualization logic.

The visualization logic for BigSumTree and CountTree is identical to SumTree except for the prefix string. This pattern could be refactored to reduce duplication.

Consider extracting the common visualization logic into a helper function:

fn visualize_tree_with_value<W: Write>(
    drawer: &mut Drawer<W>,
    prefix: &str,
    root_key: &Option<Vec<u8>>,
    value: u64,
    flags: &Option<Vec<u8>>,
) -> Result<()> {
    drawer.write(prefix.as_bytes())?;
    drawer = root_key.as_deref().visualize(drawer)?;
    drawer.write(format!(" {value}").as_bytes())?;

    if let Some(f) = flags {
        if !f.is_empty() {
            drawer = f.visualize(drawer)?;
        }
    }
    Ok(())
}

Then use it in the match arms:

Element::SumTree(root_key, value, flags) => {
    visualize_tree_with_value(drawer, "sum_tree: ", root_key, value, flags)?
}
Element::BigSumTree(root_key, value, flags) => {
    visualize_tree_with_value(drawer, "big_sum_tree: ", root_key, value, flags)?
}
Element::CountTree(root_key, value, flags) => {
    visualize_tree_with_value(drawer, "count_tree: ", root_key, value, flags)?
}
storage/src/storage.rs (1)

94-101: Consistent API extension for subtree prefix operations

Good addition of new methods for handling subtree prefixes. The implementation:

  • Maintains consistent patterns with existing methods
  • Provides clear documentation
  • Follows the same transaction handling patterns

Consider adding examples in the documentation to show usage patterns.

/// Example:
/// ```
/// # use grovedb_storage::{Storage, SubtreePrefix};
/// # let db = /* ... */;
/// let prefix: SubtreePrefix = [0; 32];
/// let storage_context = db.get_storage_context_by_subtree_prefix(prefix, None);
/// ```

Also applies to: 113-121, 132-139

🛑 Comments failed to post (11)
tutorials/Cargo.toml (1)

17-17: 💡 Codebase verification

Update blake3 to latest version 1.5.5

The current version (1.5.1) is outdated. The latest stable version is 1.5.5. While the dependency is used in the codebase (in tutorials/src/bin/replication.rs for SubtreePrefix type definition), its usage is minimal, only utilizing the OUT_LEN constant.

🔗 Analysis chain

Verify blake3 dependency version and usage

Please ensure:

  1. That this is the latest stable version of blake3
  2. That the dependency is actually used in the tutorials codebase

Let's verify both points:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check latest version of blake3
cargo search blake3 --limit 1

# Find actual usage of blake3 in the tutorials directory
rg "blake3::" tutorials/
rg "use.*blake3" tutorials/

Length of output: 270


Script:

#!/bin/bash
# Check latest version from crates.io API
curl -s https://crates.io/api/v1/crates/blake3 | jq -r '.crate.max_version'

# Find more detailed usage of blake3
rg -l "blake3" tutorials/
ast-grep --pattern 'blake3::$_' tutorials/
ast-grep --pattern 'use blake3' tutorials/

Length of output: 342

grovedb/src/element/mod.rs (1)

123-130: ⚠️ Potential issue

Update Serialization Logic for New 'Element' Variants

With the addition of BigSumTree and CountTree to the Element enum, ensure that serialization and deserialization logic correctly handles these new variants. Failing to update serialization could lead to data corruption or compatibility issues.

tutorials/src/bin/replication.rs (2)

104-104: 🛠️ Refactor suggestion

Improve Error Handling: Avoid using unwrap on sync_db_demo result

Using .unwrap() can cause a panic if sync_db_demo returns an error. Consider handling the error gracefully to enhance the application's robustness.

Apply this diff to handle potential errors:

-sync_db_demo(&db_checkpoint_0, &db_destination, &grove_version).unwrap();
+if let Err(e) = sync_db_demo(&db_checkpoint_0, &db_destination, &grove_version) {
+    eprintln!("Synchronization failed: {:?}", e);
+    // Handle the error appropriately
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    if let Err(e) = sync_db_demo(&db_checkpoint_0, &db_destination, &grove_version) {
        eprintln!("Synchronization failed: {:?}", e);
        // Handle the error appropriately
    }

266-266: 🛠️ Refactor suggestion

Improve Error Handling: Replace expect with proper error handling

Using expect on commit_session may cause a panic if an error occurs. It's better to handle the error to prevent unexpected application termination.

Apply this diff to handle the error:

-        target_db.commit_session(session).expect("failed to commit session");
+        if let Err(e) = target_db.commit_session(session) {
+            eprintln!("Failed to commit session: {:?}", e);
+            // Handle the error appropriately
+        }

Committable suggestion skipped: line range outside the PR's diff.

grovedb/src/element/helpers.rs (1)

230-235: 🛠️ Refactor suggestion

Clarification: Handle all possible TreeType variants in get_feature_type

The method get_feature_type now matches on TreeType, but does not explicitly handle all possible variants. Ensure that all variants are accounted for to prevent potential pattern-matching omissions in the future.

Consider adding a default case or explicitly matching all variants:

match parent_tree_type {
    TreeType::NormalTree => Ok(BasicMerkNode),
    TreeType::SumTree => Ok(SummedMerkNode(self.sum_value_or_default())),
    TreeType::BigSumTree => Ok(BigSummedMerkNode(self.big_sum_value_or_default())),
    TreeType::CountTree => Ok(CountedMerkNode(1)),
    // Add this to handle any future variants
    _ => Err(Error::UnsupportedTreeType),
}
grovedb/src/replication/state_sync_session.rs (1)

245-248: 🛠️ Refactor suggestion

⚠️ Potential issue

Avoid using unsafe code when obtaining transaction_ref

The use of unsafe blocks to obtain a reference to transaction bypasses Rust's safety guarantees and may lead to undefined behavior. Please refactor the code to avoid unsafe code, ensuring that lifetime and borrowing rules are correctly followed. This will enhance memory safety and maintainability.

Consider passing a reference to the transaction directly where needed or restructuring the code to eliminate the need for unsafe casts.

Also applies to: 455-458

merk/src/tree/link.rs (1)

166-166: 🛠️ Refactor suggestion

Use snake_case for method names in Rust

The method name aggregateData should be renamed to aggregate_data to adhere to Rust's naming conventions.

Apply this diff to correct the method name:

-pub const fn aggregateData(&self) -> AggregateData {
+pub const fn aggregate_data(&self) -> AggregateData {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    pub const fn aggregate_data(&self) -> AggregateData {
merk/src/merk/restore.rs (1)

457-459: ⚠️ Potential issue

Potential inversion of logic in verify method call

The verify method now uses the condition self.merk.tree_type == TreeType::NormalTree, whereas previously it used self.merk.is_sum_tree. This might invert the intended logic. If is_sum_tree was true for sum trees, the new condition checks for normal trees instead. Confirm whether the condition should be self.merk.tree_type != TreeType::NormalTree to maintain the original logic.

Apply this diff to correct the logic if necessary:

-                .verify(self.merk.tree_type == TreeType::NormalTree, grove_version)
+                .verify(self.merk.tree_type != TreeType::NormalTree, grove_version)

Committable suggestion skipped: line range outside the PR's diff.

grovedb/src/batch/mod.rs (1)

2172-2177: 🛠️ Refactor suggestion

Ensure correct TreeType is specified when initializing Merks.

In the methods open_batch_transactional_merk_at_path and open_batch_merk_at_path, TreeType::NormalTree is used when initializing new Merk instances. Verify whether this is appropriate in all contexts or if the TreeType should be determined based on the specific tree being operated on (e.g., SumTree, BigSumTree, CountTree). Using the appropriate TreeType ensures that the Merks are correctly configured for their intended operations.

Also applies to: 2220-2225, 2265-2266

grovedb/src/batch/single_sum_item_deletion_cost_tests.rs (1)

50-50: 💡 Codebase verification

The review comment is valid - additional test coverage is needed

The codebase currently lacks tests comparing deletion costs across different tree types. While there are tests for NormalTree and some SumTree operations, there's no comprehensive test coverage comparing deletion costs between NormalTree, SumTree, and BigSumTree. The suggested test addition would improve coverage and ensure consistent behavior.

🔗 Analysis chain

Consider adding test cases for SumTree and BigSumTree deletion costs.

While the changes to use TreeType::NormalTree are correct, we should verify deletion costs for other tree types.

Consider adding test cases like this:

+    #[test]
+    fn test_batch_deletion_costs_for_different_tree_types() {
+        let grove_version = GroveVersion::latest();
+        let db = make_empty_grovedb();
+
+        // Test deletion costs for each tree type
+        for tree_type in [TreeType::NormalTree, TreeType::SumTree, TreeType::BigSumTree] {
+            let ops = vec![QualifiedGroveDbOp::delete_tree_op(
+                vec![],
+                b"key1".to_vec(),
+                tree_type,
+            )];
+            // TODO: Add assertions for expected costs based on tree type
+        }
+    }

Also applies to: 157-157

🏁 Scripts executed

The following scripts were executed for the analysis:

Error: No shell scripts found


Script:

#!/bin/bash
# Search for test files containing TreeType and deletion related tests
rg -l "TreeType::(SumTree|BigSumTree)" --type rust | grep "_test"

# Look for existing deletion cost tests with different tree types
rg "test.*deletion.*cost.*TreeType" --type rust -A 5

# Search for all TreeType enum usages in tests
ast-grep --pattern 'TreeType::$_' | grep "_test"

Length of output: 169


Script:

#!/bin/bash
# Search for TreeType usage in any rust files
rg "TreeType::" --type rust -l

# Search for test modules containing deletion or batch operations
rg "mod tests" --type rust -A 10 | grep -B 2 -A 10 "batch\|deletion"

# Look specifically in the batch directory for test files
fd ".*test.*\.rs$" "grovedb/src/batch"

Length of output: 11358


Script:

#!/bin/bash
# Check content of the deletion cost test files
echo "=== single_sum_item_deletion_cost_tests.rs ==="
rg "fn test_" grovedb/src/batch/single_sum_item_deletion_cost_tests.rs -B 1 -A 10

echo -e "\n=== single_deletion_cost_tests.rs ==="
rg "TreeType::" grovedb/src/batch/single_deletion_cost_tests.rs -B 1 -A 5

Length of output: 2731

node-grove/src/converter.rs (1)

41-42: ⚠️ Potential issue

Fix inconsistent string representation for CountTree.

The CountTree variant incorrectly returns "big_sum_tree". This should be "count_tree" for consistency with the variant name and to maintain a clear distinction between different tree types.

Apply this diff to fix the string representation:

-        Element::CountTree(..) => "big_sum_tree".to_string(),
+        Element::CountTree(..) => "count_tree".to_string(),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        Element::BigSumTree(..) => "big_sum_tree".to_string(),
        Element::CountTree(..) => "count_tree".to_string(),

Copy link
Collaborator

@fominok fominok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job, and I really like that sum tree is no longer an edge case with special treatment but a part of a more general subsystem of aggregates

one thing I prefer not to have is that error message appending closure in our public API

grovedb/src/element/get.rs Outdated Show resolved Hide resolved
grovedb/src/element/get.rs Outdated Show resolved Hide resolved
@@ -48,6 +51,51 @@ pub struct SubtreePath<'b, B> {
pub(crate) ref_variant: SubtreePathInner<'b, B>,
}

fn hex_to_ascii(hex_value: &[u8]) -> String {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming is misleading, because it attempts to covert utf-8 bytes (like b"hello") rather than hex ("68656C6C6F" for same "hello")

costs/src/lib.rs Show resolved Hide resolved
* fix compilation errors

* add error context
@fominok fominok merged commit d29da25 into develop Jan 16, 2025
8 checks passed
@QuantumExplorer QuantumExplorer deleted the feat/bigSumTrees branch January 16, 2025 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants