CAP: 0025
Title: Remove Bucket Shadowing
Author: Marta Lokhava
Status: Final
Created: 2019-07-02
Discussion: https://groups.google.com/d/topic/stellar-dev/JteacMugsMQ/discussion
Protocol version: 12
This proposal makes simplifications to the bucket list data structure in stellar-core, such that the new bucket merge logic results in different buckets, than ones produced by previous versions.
The main simplification is removal of shadows
, which are older versions of the bucket list.
Shadows are used to avoid storing all updates to the same ledger entry, and keeping only the most
recent one instead.
Bucket list is a log-structured merge (LSM) tree used to store the complete state of stellar-core.
As new entries are added, bucket list merges its existing buckets to produce new buckets, which
incorporate the new data. Additionally, bucket list stores its previous versions, called shadows.
The purpose of shadows is to handle high volume of changes to the same ledger entries (i.e., entries
of LIVEENTRY
type). That is, if a LIVEENTRY
entry exists at a higher (younger) level, another
LIVEENTRY
at a lower (older) level is elided during a merge. This proposal aims to remove shadows,
by showing that they:
- Have a high storage cost.
- Significantly increase bucket merge latency.
- Are only valuable for a niche range of scenarios. Additionally, their usefulness degrades at lower levels.
The changes proposed will not affect bucket semantics. It will merely change how buckets are merged, producing different outputs compared to previous protocol versions.
This change was motivated by the performance analysis done on shadows. In the current protocol, for level i, shadows consist of all levels above level i - 1. Every level, starting with level 2, stores such a list of shadows. During bucket merge, aside from reading the contents of the two buckets being merged, all shadows are iterated through to check whether an elision of a particular ledger entry is needed. This slows down the merge latency significantly. The assessment of current public network traffic shows that merge times for lowest-level buckets complete approximately 3 times faster without shadows than with shadows.
Additionally, shadows create a significant storage burden. On one hand, they prevent constantly churning ledger entries from propagating all the way down to lowest levels, avoiding redundant entries. On the other hand, the data from the public network suggests that the overhead of such spilling is not particularly significant. For example, the resulting buckets without shadows were at most twice bigger than ones that used shadows. However, the total size of shadows stored for lower levels still outweighed the overhead of larger resulting buckets.
Lastly, there are a few very specific cases when a particular level can be fully shadowed. That is, for shadows to be the most useful at level i, it must be assumed that nothing above level i is shadowed. This also means that storing shadows for all previous levels is wasteful, since nothing is shadowed. At lower levels, shadows are less useful, since the bucket list incurs the overhead of unnecessary storage of shadows at higher levels. Moreover, the argument is based on an assumption that all the network traffic follows a specific, unlikely pattern: it updates a certain number of same ledger entries continuously.
The ledger protocol number will be increased, to version 12. This is a breaking change. The bucket list will produce buckets differing from buckets of previous protocols. This difference will affect the bucket list hash, which is used for consensus.
Bucket semantics remain unchanged. The only difference is potentially more frequent
occurrences of LIVEENTRY
types of bucket entries across multiple buckets for the same ledger entry.
Under protocol 12, new style bucket merges will not consider shadows, so less bucket entries will be elided. In order to perform a new style merge, nodes will compare versions of the inputs. Specifically, a new style merge will be performed if the input snap is of ledger version 12 or higher. This implies that nodes will keep performing old style merges with shadows until input snaps are of the right version.
To avoid delays while adding the new data to the bucket list, old buckets are merged in the background as early as possible. Some of those merges are completed long before they are needed. After the protocol 12 upgrade, all existing in-progress or completed (but not yet needed) merges will remain relevant. The implementation will use these merges at appropriate ledgers, then, depending on the input versions for each level, will start old style or new style merges. Eventually, all bucket list levels will be performing new style merges. Because new style merges are started gradually, letting merges of older versions finish, nodes may upgrade at any ledger without delays.
Starting protocol 12, nodes will not publish inputs or outputs to new style merges. With shadows removed, inputs to bucket merges are completely reconstructible from the bucket list itself. Note, inputs to merges remain stable and derivable from the current bucket list until the future bucket is promoted into the bucket list. Due to this property, it is possible to re-start merges at any ledger between the initial merge trigger and the completed merge promotion.
The initial bucket list design did not have a comprehensive analysis of shadows, showing that it is not a general optimization, but rather an improvement to a very specific ledger traffic scenario.
Additionally, other known implementations of LSM trees suggest that per-level compaction is sufficient. For example, LevelDB gives an overview of its compactions:
https://github.com/google/leveldb/blob/master/doc/impl.md#compactions
This is a breaking change, even though there are no semantic changes to the bucket list. The updated merge mechanism can be regarded as an “implementation detail”. Because of that, older versions will not error when processing new buckets. However, they will produce a different ledger hash, preventing them from reaching consensus, and making progress.
The new versions of stellar-core will be able to process old and new buckets, and pick the appropriate merge technique based on the version number in the ledger header.
Test cases were included in the implementation.