-
Notifications
You must be signed in to change notification settings - Fork 189
[Chunk Data Pack Pruner] Add block view index #6933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6933 +/- ##
==========================================
- Coverage 41.28% 41.27% -0.02%
==========================================
Files 2160 2160
Lines 189388 189459 +71
==========================================
+ Hits 78192 78201 +9
- Misses 104677 104736 +59
- Partials 6519 6522 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
||
viewCache: newCache(collector, metrics.ResourceCertifiedView, | ||
withLimit[uint64, flow.Identifier](4*flow.DefaultTransactionExpiry), | ||
withRetrieve(retrieveView)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is only filled by retrieval op.
// certified blocks are the blocks that have received QC. | ||
// | ||
// TODO: this method is not available until next spork (mainnet27) or a migration that builds the index. | ||
// ByView(view uint64) (*flow.Header, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will uncomment in next spork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I very much agree with implementing a ByView
indexing and retrieval. Though, I am a bit uncomfortable with the assumptions and undocumented dependencies that the proposed change would introduce:
- the
QuorumCertificates
populates the ByView index without ever seeing a block, while theHeaders
just assumes that if that index exist, a corresponding block has been previously stored. Nowhere doesQuorumCertificates
document that only QCs for known blocks are to be stored. QuorumCertificates
andHeaders
are low-level storage abstractions, but they make very specific assumptions how the higher level code (specifically theMutator
) is supposed to use them:Line 35 in 92ac8d1
// while storing qc, we also index the block by view in the same transaction. - Just hypothetically, if somebody has the great idea to persist QCs on consensus nodes once they construct them (in case the node crashes) and maybe even do that for QCs pertaining to unknown blocks, the assumption made in
QuorumCertificates
would be void. Its probably not the best example, but it is an example how fragile those assumptions can be.
- Just hypothetically, if somebody has the great idea to persist QCs on consensus nodes once they construct them (in case the node crashes) and maybe even do that for QCs pertaining to unknown blocks, the assumption made in
So in summary, I am to very comfortable with changing the meaning of QuorumCertificates
from "storing quorum certificates" to "storing QuorumCertificates only for known blocks". Essentially, we are introducing additional dependencies and assumptions between QuorumCertificates
, Headers
and Mutator
. While in each individual case, the coupling is small, over time such couplings accumulate across the code base and make the code very difficult to maintain and to guarantee correctness. A major focus of our work is managing complexity and guaranteeing correctness, if that requires to write a bit more boilerplate code - I think that would be a good tradeoff.
Some thoughts / suggestions:
-
since
Headers
retrieves the blocks by view and assumes that for an indexed view a block exists, it should also populate the by-view index in my opinion (keeping the code that deals with the same thing in one place). -
The
Mutator
is the module that has the authority to declare blocks as finalized, sealed, etc ... I think the authority to declare a block as certified also falls within that category (certainly I would discourage the give the storage layer the authority to declare a block certified). I would suggest to make that explicit by having theMutator
call a dedicated methodIndexCertifiedBlockByView(flow.Identifier, uint64)
that is provided by the storage abstraction (e.g.Headers
). -
In the way I see it
Blocks
is an extension ofHeaders
.- Therefore, also
Blocks
could offer theByView
method Blocks
could also offer theDeclareCertifiedTx
method and delegate that toIndexCertifiedBlockByView
(?) To me, that would seem very consistent.
- Therefore, also
-
method
IndexCertifiedBlockByView
would need to be called by theMutator
in the following two places:
e6cb39d
to
98f6014
Compare
state/protocol/badger/mutator.go
Outdated
@@ -365,6 +365,12 @@ func (m *FollowerState) headerExtend(ctx context.Context, candidate *flow.Block, | |||
return fmt.Errorf("could not store incorporated qc: %w", err) | |||
} | |||
} else { | |||
// parent is a block that has been received and certified by a QC. | |||
err := transaction.WithTx(operation.SkipDuplicates(operation.IndexCertifiedBlockByView(parent.View, qc.BlockID)))(tx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexHentschel I was using the db operation directly, for 2 reasons:
- I wanted to add a IndexCertifiedBlockByView method to headers, and simply call m.headers.IndexCertifiedBlockByView(parent) here. But the storage.Headers interface is mostly for read methods, we didn't expose methods like IndexFinalizedBlockByView because we didn't want this method being misused.
- For the same reason I also didn't add IndexCertifiedBlockByView method to Blocks, and also, the parent block body doesn't exist in this context, passing a block to blocks.IndexCertifiedBlockByView would require us to make additional query to the parent block body.
regarding your suggestion for file
yes, I would appreciate that :-) |
Co-authored-by: Alexander Hentschel <alex.hentschel@flowfoundation.org>
9953c43
to
f383dcb
Compare
Fixed cd0ec0c |
f383dcb
to
cd0ec0c
Compare
…additions of blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Leo for the great PR. Appreciate your patience with the review time 🙈.
I have created a documentation PR on aspects that I wondered about while reviewing your PR ... mostly forward-facing thoughts on where the mutator
code would need to change in the future when moving to pebble. I think the only place where my PR is relevant for your immediate changes is my argument that we don't need to Skip Duplicates
(my comments here and here).
I am happy if you merge your PR first, and then we iterate over my documentation - don't feel obligated to include that as part of your PR .... though, don't hesitate to merge mine it into your branch if you feel it would help Jordan with his review 🤷.
// ByView returns the block with the given view. It is only available for certified blocks. | ||
// Certified blocks are the blocks that have received QC. Hotstuff guarantees that for each view, | ||
// at most one block is certified. Hence, the return value of `ByView` is guaranteed to be unique | ||
// oven for non-finalized blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// oven for non-finalized blocks. | |
// even for non-finalized blocks. | |
// Expected errors during normal operations: | |
// - `storage.ErrNotFound` if no certified block is known at given view. |
@@ -23,6 +23,12 @@ type Blocks interface { | |||
// for finalized blocks. | |||
ByHeight(height uint64) (*flow.Block, error) | |||
|
|||
// ByView returns the block with the given view. It is only available for certified blocks. | |||
// certified blocks are the blocks that have received QC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// certified blocks are the blocks that have received QC. | |
// certified blocks are the blocks that have received QC. Hotstuff guarantees that for each view, | |
// at most one block is certified. Hence, the return value of `ByView` is guaranteed to be unique | |
// even for non-finalized blocks. | |
// Expected errors during normal operations: | |
// - `storage.ErrNotFound` if no certified block is known at given view. |
err := headers.Store(block.Header) | ||
require.NoError(t, err) | ||
|
||
// verify storing the block doesn't not index the view automatically |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// verify storing the block doesn't not index the view automatically | |
// Verify storing the block does not index the view automatically. It would not be safe to | |
// do that, since a byzantine leader might produce multiple proposals for the same view. |
_, err = headers.BlockIDByView(block.Header.View) | ||
require.ErrorIs(t, err, storage.ErrNotFound) | ||
|
||
// index block by view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// index block by view | |
// Though, HotStuff guarantees that for each view at most one block is certified. Hence, we can index | |
// a block by its view _only_ once the block is certified. When a block is certified is decided by the | |
// higher-level consensus logic, which then triggers the indexing of the block by its view via a | |
// dedicated method call: |
@@ -48,6 +49,14 @@ func NewHeaders(collector module.CacheMetrics, db *badger.DB) *Headers { | |||
} | |||
} | |||
|
|||
retrieveView := func(view uint64) func(tx *badger.Txn) (flow.Identifier, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[entirely optional] wondering if it would increase clarity of the code if we renamed:
retrieveView
toretrieveByView
retrieveHeight
toretrieveByHeight
retrieve
toretrieveByID
@@ -110,6 +123,19 @@ func (h *Headers) ByHeight(height uint64) (*flow.Header, error) { | |||
return h.retrieveTx(blockID)(tx) | |||
} | |||
|
|||
// ByView returns block header for the given view. It is only available for certified blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ByView returns block header for the given view. It is only available for certified blocks. | |
// ByView returns block header for the given view. It is only available for certified blocks. | |
// Certified blocks are the blocks that have received QC. Hotstuff guarantees that for each view, | |
// at most one block is certified. Hence, the return value of `ByView` is guaranteed to be unique | |
// even for non-finalized blocks. | |
// Expected errors during normal operations: | |
// - `storage.ErrNotFound` if no certified block is known at given view. | |
// | |
// Note: this method is not available until next spork or a migration that builds the index. |
// IndexCertifiedBlockByView indexes the view of a block. | ||
// HotStuff guarantees that there is at most one certified block. Caution: this does not hold for | ||
// uncertified proposals, as byzantine actors might produce multiple proposals for the same block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// IndexCertifiedBlockByView indexes the view of a block. | |
// HotStuff guarantees that there is at most one certified block. Caution: this does not hold for | |
// uncertified proposals, as byzantine actors might produce multiple proposals for the same block. | |
// IndexCertifiedBlockByView indexes a block by its view. | |
// HotStuff guarantees that there is at most one certified block per view. Caution: this does not hold for | |
// uncertified proposals, as a byzantine leader might produce multiple proposals for the same view. |
@@ -83,6 +83,17 @@ func (b *Blocks) ByHeight(height uint64) (*flow.Block, error) { | |||
return b.retrieveTx(blockID)(tx) | |||
} | |||
|
|||
func (b *Blocks) ByView(view uint64) (*flow.Block, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (b *Blocks) ByView(view uint64) (*flow.Block, error) { | |
// ByView returns the block with the given view. It is only available for certified blocks. | |
// certified blocks are the blocks that have received QC. Hotstuff guarantees that for each view, | |
// at most one block is certified. Hence, the return value of `ByView` is guaranteed to be unique | |
// even for non-finalized blocks. | |
// Expected errors during normal operations: | |
// - `storage.ErrNotFound` if no certified block is known at given view. | |
func (b *Blocks) ByView(view uint64) (*flow.Block, error) { |
In the Headers
implementation, you included the following note:
flow-go/storage/badger/headers.go
Line 127 in 9953c43
// Note: this method is not available until next spork or a migration that builds the index. |
Not sure if it would be beneficial to also add here?
state/protocol/badger/mutator.go
Outdated
@@ -365,6 +365,12 @@ func (m *FollowerState) headerExtend(ctx context.Context, candidate *flow.Block, | |||
return fmt.Errorf("could not store incorporated qc: %w", err) | |||
} | |||
} else { | |||
// parent is a block that has been received and certified by a QC. | |||
err := transaction.WithTx(operation.SkipDuplicates(operation.IndexCertifiedBlockByView(parent.View, qc.BlockID)))(tx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to skip duplicates here. The reasoning is in detail documented in my follow-up PR #7101
err := transaction.WithTx(operation.SkipDuplicates(operation.IndexCertifiedBlockByView(parent.View, qc.BlockID)))(tx) | |
err := transaction.WithTx(operation.IndexCertifiedBlockByView(parent.View, qc.BlockID))(tx) |
state/protocol/badger/mutator.go
Outdated
@@ -389,6 +395,13 @@ func (m *FollowerState) headerExtend(ctx context.Context, candidate *flow.Block, | |||
if err != nil { | |||
return fmt.Errorf("could not store certifying qc: %w", err) | |||
} | |||
|
|||
// candidate is a block that has been received and certified by a QC | |||
err := transaction.WithTx(operation.SkipDuplicates(operation.IndexCertifiedBlockByView(candidate.Header.View, blockID)))(tx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to skip duplicates here. The reasoning is in detail documented in my follow-up PR #7101
err := transaction.WithTx(operation.SkipDuplicates(operation.IndexCertifiedBlockByView(candidate.Header.View, blockID)))(tx) | |
err := transaction.WithTx(operation.IndexCertifiedBlockByView(candidate.Header.View, blockID))(tx) |
…estions documentation for `mutator` wrt concurrent additions of blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. There are a few documentation changes I didn't understand, but I think those are from @AlexHentschel's PR.
@@ -135,6 +135,9 @@ func NewFullConsensusState( | |||
// CAUTION: | |||
// - This function expects that `certifyingQC ` has been validated. (otherwise, the state will be corrupted) | |||
// - The parent block must already have been ingested. | |||
// - Attempts to extend the state with the _same block concurrently_ are not allowed. | |||
// (will not corrupt the state, but may lead to an exception) | |||
// Orphaned blocks are excepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orphaned blocks are excepted.
I'm not clear on what this means. Excepted as in the above caution doesn't apply? Or do they trigger an exception? @AlexHentschel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it means "above caution does not apply", I don't think it is accurate. If it means "triggers an exception" I think we should say that instead.
// Therefore, they should be identified | ||
// additional are just a no-op if they are identified as such early enough by the `checkBlockAlreadyProcessed` - though if the same extension is added concurrently, we might return a | ||
// [storage.ErrAlreadyExists] error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, they should be identified
additional are just a no-op if they are identified as such early enough by the `checkBlockAlreadyProcessed [...]
I'm not understanding this sentence... 😅 @AlexHentschel
// parent has already been ingested. Otherwise, an exception is returned. | ||
// - Attempts to extend the state with the _same block concurrently_ are not allowed. | ||
// (will not corrupt the state, but may lead to an exception) | ||
// Orphaned blocks are excepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here - it isn't clear what "orphaned blocks are excepted" means.
@@ -53,8 +53,17 @@ type FollowerState interface { | |||
// CAUTION: | |||
// - This function expects that `qc` has been validated. (otherwise, the state will be corrupted) | |||
// - The parent block must already be stored. | |||
// - Attempts to extend the state with the _same block concurrently_ are not allowed. | |||
// (will not corrupt the state, but may lead to an exception) | |||
// Orphaned blocks are excepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here
// parent has already been ingested. Otherwise, an exception is returned. | ||
// - Attempts to extend the state with the _same block concurrently_ are not allowed. | ||
// (will not corrupt the state, but may lead to an exception) | ||
// Orphaned blocks are excepted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
storage/badger/blocks.go
Outdated
header, err := b.headers.ByView(view) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
tx := b.db.NewTransaction(false) | ||
defer tx.Discard() | ||
return b.retrieveTx(header.ID())(tx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
header, err := b.headers.ByView(view) | |
if err != nil { | |
return nil, err | |
} | |
tx := b.db.NewTransaction(false) | |
defer tx.Discard() | |
return b.retrieveTx(header.ID())(tx) | |
blockID, err := b.headers.BlockIDByView(view) | |
if err != nil { | |
return nil, err | |
} | |
tx := b.db.NewTransaction(false) | |
defer tx.Discard() | |
return b.retrieveTx(blockID)(tx) |
Use the underlying index directly to save a encode/hash and potentially an extra database read.
storage/badger/headers.go
Outdated
} | ||
|
||
return h | ||
} | ||
|
||
// StoreTx allows us to store a new header, as part of a DB transaction, while still going through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// StoreTx allows us to store a new header, as part of a DB transaction, while still going through | |
// storeTx allows us to store a new header, as part of a DB transaction, while still going through |
This PR adds an index for lookup block by view. The index is only added to certified block which has received a QC.
This index allows us to iterate through each view, and prune data indexed by the block ID.
Note: The method is not available until either next spork or a migration to build the index is completed.