Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: bidirectional references (non-batched operations) #345

Open
wants to merge 29 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions adr/atomicity.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Addressing Atomicity

## Level 1: RocksDB Transactions

In GroveDB, almost no operation -- if any at all -- can be executed as a single atomic
operation in RocksDB, the underlying storage used by GroveDB. As long as parallel access
to GroveDB is allowed, there is no guarantee that data will remain consistent across the
multiple operations required at the RocksDB level. Partially, we address this issue using
RocksDB batches, which will be discussed in more detail in the next section. However,
these batches do not address data fetches that may occur while the final RocksDB batch is
being constructed. The data fetched at one step of the operation may be inconsistent with
the data fetched later, as background updates may have occurred in the meantime.

To demonstrate the problem, let’s consider a scenario where there is an only key `c`
under the subtree `[a,b]`. One actor updates this key with a new value while another actor
performs an insertion at a different location:

```
Actor 1: Actor 2:
- load subtree [a,b] with root -- c
- under subtree [a,b] key c insert value x,
we're not going into much detail there as - insert empty subtree into [a,b] under key d
as it was done in one batch and we care - under subtree [a,b,d] key e insert value y
only about what happened to c - under subtree [a,b] key d insert new root
... hash and root key of subtree [a,b,d]
- compute root [a,b] hash as hash of joined
hashes of c and d *WE HAVE OLD C*
- under subtree [a] key b insert new root
hash and root key of subtree [a,b]
- under subtree [] key a insert new root
hash and root key of subtree [a]
```

... and not to mention what will happen with the ancestors' hashes.

__Solution__: all operations shall be performed via RocksDB transactions.

While this is straightforward for modifications, queries and `get` operations also require
transactions. In general, they cannot be represented by a single RocksDB operation too.
Although `get` may be an exception when no references are involved, data still needs to be
loaded first, and isolation might be required. Therefore, transactions should be provided
from the start.

Since the first release transaction arguments are optional, now we internally start a
transaction if none is provided. To facilitate this, `crate::utils::TxRef` was introduced.

`TxRef` wraps a transactions provided from user if any, otherwise starts a new one. The
rest of the GroveDB internals are unaware of the transaction source and uses what `TxRef`
provied to them with `TxRef::as_ref` method.

In case the transaction was started internally it shall be commited internally as well,
for that purpose `TxRef::commit_local` is used, that will commit the transaction if it is
ineed "local" or is no-operation if the transaction is passed by user, leaving it to the
user to decide what to do with it.

## Level 2: RocksDB Batches

_Not to be confused with GroveDB batches!_

In general, if an operation fails, it doesn't necessarily mean that the entire transaction
should be aborted, unless it came into an inconsisent state. At least, this is not the
desired behavior in GroveDB, as it is used in Dash Platform: a transaction should live
for the duration of a block, with operations happening seamlessly -- even those that
may fail.

As stated before, an operation that changes the state of GroveDB consists of many RocksDB
operations. However, we do not apply them directly to the provided transaction. Instead,
we aggregate them into a RocksDB batch, which is applied to the transaction all at once
at the end of the GroveDB operation. This approach allows for failure without aborting
the entire transaction, as it will only abort the batch, leaving the transaction state
untouched.

To apply the `StorageBatch` with these deferred operations onto a running transaction,
`Storage::commit_multi_context_match` is used, where the main implementation of `Storage`
in our case is `RocksDbStorage`.

## Level 3: GroveDB Batches

While RocksDB batches are an implementation detail, GroveDB batches are part of the public
API, on par with regular operations provided by GroveDB. When several updates to GroveDB
need to be performed atomically from a user perspective, without sacrificing a transaction
in case of failure, GroveDB batches are used.

The main takeaways are:

- Always a transaction, whether provided externally or not.
- Always one RocksDB batch applied for modifications.
- Calling `insert*/delete*` results in one RocksDB batch being applied.
- Applying a GroveDB batch full of `insert*/delete*` results in one RocksDB batch, likely
just larger.
169 changes: 169 additions & 0 deletions adr/bidirectional_references.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Bidirectional references

GroveDB has supported references since its first release; however, the consistency between
references and the data they refer to is only guaranteed at the moment they are inserted.
Subsequent updates to the data do not propagate to the references pointing to it, which
can lead to diverged hashes or references pointing to deleted items.

If the lack of consistency between references and data becomes a problem for a part of the
application using GroveDB, it can choose to use bidirectional references instead.

For this purpose, several new `Element` variants were introduced:

```rust
pub enum Element {
...
/// A reference to an object by its path
BidirectionalReference(BidirectionalReference),
/// An ordinary value that has a backwards reference
ItemWithBackwardsReferences(Vec<u8>, Option<ElementFlags>),
/// Signed integer value that can be totaled in a sum tree and has a
/// backwards reference
SumItemWithBackwardsReferences(SumValue, Option<ElementFlags>),
}

pub struct BidirectionalReference {
pub forward_reference_path: ReferencePathType,
pub backward_reference_slot: SlotIdx,
pub cascade_on_update: CascadeOnUpdate,
pub max_hop: MaxReferenceHop,
pub flags: Option<ElementFlags>,
}
```

These items are counterparts of existing ones: items, sum items, and regular references.
A regular item with ordinary references does not propagate updates back to the reference
chain origin. When such behavior is required, a different type of element should be used.
Moreover, these types are incompatible, which will be discussed in the "Rules" section.

Additionally, a new flag was added to `InsertOptions`, `DeleteOptions`, and `ClearOptions`
called `propagate_backward_references`. Since propagation incurs a cost, starting with the
checks required to determine whether it should be performed, bidirectional references are
optional and must be explicitly enabled.

Even when a user inserts something unrelated to the bidirectional references feature,
a check must still be performed to determine whether the insertion overwrites an item
with backward references. If it does, this could trigger a cascade deletion or fail with
an error if cascade deletion is not allowed in the bidirectional references parameters.
However, propagation must be enabled from the start for this check to take place at all.
Fetching the previous item on every modification introduces additional overhead, which
would be unfair to applications that do not use this feature or for database sections that
do not require it. To address this, the flag was introduced.

## Rules

Next, we’ll go over the rules and limitations for using bidirectional references.

Note that for the rules to apply, the `propagate_backward_references` flag needs to be
set.

An 'Element with backward references' refers to `ItemWithBackwardReferences`,
`SumItemWithBackwardReferences`, and `BidirectionalReference`, as all these types contain
a list of backward references associated with them.

- __Only elements with backward references can be targets of bidirectional references.__
Trying to create a bidirectional reference to a regular item will result in an error. And
just like regular references, bidirectional references cannot point to subtrees.
- __A (Sum)Item with backward references can be referenced by up to 32 bidirectional
references.__ This limit exists due to implementation constraints and to ensure worst-case
costs remain predictable—without a limit, estimating these costs would not be possible.
- __A bidirectional reference can be referenced by another bidirectional reference, but
no more than 1.__ This limitation was introduced for the same reason as before: to keep
propagation costs predictable. By restricting chains to one reference per bidirectional
reference, we ensure that an item with up to 32 bidirectional references (each containing
no more than 10 links) can be traced without branching into more paths, allowing us to
predict and manage the worst-case update costs.
- __If an element with backward references is updated with another element with backward
references, hash propagation happens.__ All bidirectional references across all chains
shall update their hashes using the new one of the updated item. If the updated item is
a new bidirectional reference itself, it will follow the chain forward first to get the
value hash that will be used for propagation.
- __If an element can no longer be targeted (for example, updated to an item with no
backward references support or deleted entirely), a cascade deletion of bidirectional
references occurs.__ This requires the `cascade_on_update` setting for each affected
bidirectional reference. If this setting is not enabled, an error will be raised,
preventing the operation from completing successfully.

## Implementation

_Work in progress: Support for bidirectional references in `apply_batch` is not yet
implemented._

Bidirectional references are optional for each call to GroveDB's public API, and a flag is
used to enable their functionality for that specific call. Essentially, when the flag is
present, it modifies the regular execution process in two ways:

1. Modifications (both writes and deletions) will fetch the data being updated.
2. If the fetched item is an element with backward references, control is passed to the
`bidirectional_references` module in the GroveDB root for post-processing. This occurs for
bidirectional reference insertion regardless of whether the flag is set.

Quite a lot happens behind this "post-processing," and we'll go into the details shortly.

### Meta Storage

Bidirectional references do not alter the state of the elements they point to, as that
could unintentionally trigger a cascade of propagations. Since backward references are
not stored directly with the element's data, the meta column family is used to store them
instead.

Meta storage follows the same scheme as regular storage, using prefixes. By employing
prefixes, we achieve a local meta storage for each Merk. This prefix is extended with a
"namespace" to separate the backward references domain from any other possible usages of
meta storage and the element's key is appended.

Under the key made by that concatenation, a 32-bit integer is stored, representing
a bit vector. Each bit set corresponds to a backward reference stored under the prefix,
with the index added to the prefix to create a new key. This key is used to store the
actual backward reference data. When inserting or changing a bidirectional reference,
which alters the backward references list of an element, the integer (bitvec) is modified
to set or unset a slot. The value under the new key, composed of the prefix and the slot
index, is updated without affecting other slots, maintaining determinism.

The backward reference is defined as:

```rust
pub(crate) struct BackwardReference {
pub(crate) inverted_reference: ReferencePathType,
pub(crate) cascade_on_update: bool,
}
```

For example, the data for a subtree `[a, b]` with key `c`, which contains
`ItemWithBackwardReferences` and is referenced by two bidirectional references from `[d]`
with keys `e` and `f`, could look like this:

```
* [a,b] prefix = ba1337ab
* [d] prefix = ee322322

Data:
ba1337abc : TreeNode { .. Element::ItemWithBackwardReferences(..)} // approx
ee322322e : TreeNode { .. Element::BidirectionalReference(/* reference path [a,b,c] */) }
ee322322f : TreeNode { .. Element::BidirectionalReference(/* reference path [a,b,c] */) }

Meta:
ba1337abrefsc : b00000000000000000000000000000011
ba1337abrefsc0 : BackwardReference(/* reference path [d,e] */)
ba1337abrefsc1 : BackwardReference(/* reference path [d,f] */)
```

### Propagation

Previous read: [Merk cache](./merk_cache.md).

Deletion or an update of an element with backward references triggers a cascade hash
update or a deletion, both of which alter the state of affected subtrees, leading to
regular hash propagation to ancestor subtrees up to the GroveDB root. In short, operations
with the required flag enabled can trigger updates across several subtrees simultaneously.

Thus, there are two ongoing propagations:

1. Backward references chain hash propagation / cascade deletion.
2. Regular hash propagation of subtrees.

It is possible that a reference propagation could impact a subtree that is also affected
by regular propagation from one of its descendants. This is difficult to predict. Since
these propagations happen at different steps, they can result in multiple Merk openings
causing issues. To manage this, caching becomes mandatory. This led to the introduction of
`MerkCache`, which has become a crucial component for handling bidirectional references.
Loading
Loading