-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control block forging through NodeKernel #3800
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I only reviewed the Added setBlockForging to NodeKernel
commit so far.
ouroboros-consensus/src/Ouroboros/Consensus/HardFork/Combinator/Embed/Binary.hs
Outdated
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/HardFork/Combinator/Embed/Binary.hs
Outdated
Show resolved
Hide resolved
5455841
to
b9343a8
Compare
37a4cbf
to
7fcff5c
Compare
a98a108
to
15da26e
Compare
15da26e
to
d327246
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for all the heavy-lifting you've done here in the Consensus code! Your diff is really clean, and the effort that took is much appreciated 🙏.
I am Requesting Changes only because of the amount of copy-paste code I'm seeing; lots of "parameter projection" code was duplicated between the new protocolInfo*
and blockForging*
functions, which have been split apart from the old protocolInfo*
functions (whose single definition was able to share the parameter projections about the two halves that are separated in the new code). See the bigger Conversation about it below.
d327246
to
cfbd8ff
Compare
I am getting some test failures:
|
ouroboros-consensus-cardano/src/Ouroboros/Consensus/Cardano/Node.hs
Outdated
Show resolved
Hide resolved
0748406
to
34c4e95
Compare
Ah ha! Hydra built green with my typo fixup! 🙌 |
I'm mentally gassed at this point in my day. I'll do a last pass tomorrow during/after our call. |
ac457b5
to
998242c
Compare
That is right. Some tests do it, but not the implementation. |
Block forging is removed from ProtocolInfo, and can controlled by using `NodeKernel` field: `setProtocolForging :: [BlockForging m blk] -> m ()`.
When make sure that when a block is added to the ChainDB, transactions will be removed from the mempool. The 'addBlockAsync' is a lightweight non-blocking operation but the finalizer is blocking (`blockProcessed` will block until the block was added to the ChainDB). Hence we need to use `uninterruptibleMask_` to make it safe in presence of asynchronous exceptions.
When the block forger thread adds a new block, the adding thread might be killed by an async exception. If that happens, the block forger will get 'Nothing' when `blockProcessed` returns, and it can exit.
* ouroboros-consensus-test * ouroboros-consensus-cardano-tools
`addBlock_` is used by `initNodeKernel` when calling the `initChainDB` callback from `NodeKernelArgs`.
998242c
to
c102d99
Compare
ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/Types.hs
Show resolved
Hide resolved
ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/Background.hs
Show resolved
Hide resolved
@@ -437,7 +439,7 @@ addBlockWaitWrittenToDisk chainDB punish blk = do | |||
|
|||
-- | Add a block synchronously: wait until the block has been processed (see | |||
-- 'blockProcessed'). The new tip of the ChainDB is returned. | |||
addBlock :: IOLike m => ChainDB m blk -> InvalidBlockPunishment m -> blk -> m (Point blk) | |||
addBlock :: IOLike m => ChainDB m blk -> InvalidBlockPunishment m -> blk -> m (Maybe (Point blk)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking more closely at nodeInitChainDB @ByronBlock
--- which is ultimately the only interesting transitive use of API.addBlock
(see previous message) --- we see it's merely adding the slot 0 Epoch Boundary Block.
There's no way to recover, if that fails.
So: the only real use of addBlock
in the system has no expectation of failure and no useful way to recover if it did fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some style and a new problematic observation regarding concurrency :(
@@ -492,6 +494,14 @@ addBlockToAdd tracer (BlocksToAdd queue) punish blk = do | |||
getBlockToAdd :: IOLike m => BlocksToAdd m blk -> m (BlockToAdd m blk) | |||
getBlockToAdd (BlocksToAdd queue) = atomically $ readTBQueue queue | |||
|
|||
-- | Flush the 'BlocksToAdd' queue and notify the waiting threads. | |||
-- | |||
closeBlocksToAdd :: IOLike m => BlocksToAdd m blk -> STM m () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah; this doesn't seem enough. We have a race now, don't we?
- Some threads are adding tasks to the queue; these are BlockFetch clients and the
NodeKernel.hs
forge. - One thread is popping one task from the queue at a time
If the popping thread is killed mid-pop, then it notifies the unlucky owner of the task that got interrupted. And it also flushes the queue, similarly notifying all other task owners. And then the popping thread will terminate, since bracket*
re-raises the exception.
But there's no guarantee the popping thread will be the last to terminate. So even when it's gone, other threads may be adding to the queue.
The first option that comes to mind:
-
Complicate the queue by adding state indicating whether it's "open" or "closed" and have theHmm... theaddBlockRunner
close the queue when it flushes it.ChainDB
already has an open/closed state; maybe theaddBlockRunner
dying is either reason enough to fully close the ChainDB or can only actually happen when the ChainDB is closed/closing or something like that? -
But nowNowaddBlockAsync
will also be partial, since you can't add to a closed queue.addBlockAsync :: ... -> m (AddBlockPromise m blk)
would create a degenerate "promise" (immediately filled withFalse
andNothing
) when asked to add a block to a closed queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really grasp the problem here. Isn't the point of STM to be atomic, I don't think we'll get a Async exception mid-pop, but what can happen is the thread getting interrupted while blocked on waiting for popping. And if that's the case the cleanup handle won't even run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nfrisby is right, we need to make sure no new block can be accepted by the db after closeBlocksToAdd
is called by addBlockRunner
. @bolt12 it's not about the thread itself, it's about all the other concurrent writes to the ChainDB
that are done by all block-fetch clients.
@nfrisby The exception will propagate and eventually ChainDB.closeDB
will be called. It seems to me that we don't have access to the TVar
which holds ChainDbState
in the context of addBlockRunner
, so keeping the state of the queue might be an easier option to implement (as in the suggestion you stroked).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nfrisby I am taking over this PR and I had some comments, could you spare some time to reply so I feel confident in making the changes needed?
Notice that I also need to rebase (eek!) this very old PR, but I figure addressing the issues first will be better
|
||
- Added `setBlockForging` to `NodeKernel` which must be used to set / control | ||
block forging of the consensus layer. | ||
- We removed the `pInfoBlockForging` record field from the `ProtocolInfo` type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth mentioning that it got extracted rather than just plain removed
@@ -437,7 +439,7 @@ addBlockWaitWrittenToDisk chainDB punish blk = do | |||
|
|||
-- | Add a block synchronously: wait until the block has been processed (see | |||
-- 'blockProcessed'). The new tip of the ChainDB is returned. | |||
addBlock :: IOLike m => ChainDB m blk -> InvalidBlockPunishment m -> blk -> m (Point blk) | |||
addBlock :: IOLike m => ChainDB m blk -> InvalidBlockPunishment m -> blk -> m (Maybe (Point blk)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it is better to be explicit about the error in the type if possible, specially in this case where, at the call site, it is not obvious the reason one might get Nothing
. So just to make sure I got it right, propagate a Maybe-isomorph type until Init.addBlock
and have that function throw an exception, is that so?
@@ -492,6 +494,14 @@ addBlockToAdd tracer (BlocksToAdd queue) punish blk = do | |||
getBlockToAdd :: IOLike m => BlocksToAdd m blk -> m (BlockToAdd m blk) | |||
getBlockToAdd (BlocksToAdd queue) = atomically $ readTBQueue queue | |||
|
|||
-- | Flush the 'BlocksToAdd' queue and notify the waiting threads. | |||
-- | |||
closeBlocksToAdd :: IOLike m => BlocksToAdd m blk -> STM m () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really grasp the problem here. Isn't the point of STM to be atomic, I don't think we'll get a Async exception mid-pop, but what can happen is the thread getting interrupted while blocked on waiting for popping. And if that's the case the cleanup handle won't even run.
Closed in favor of IntersectMBO/ouroboros-consensus#140 |
This PR supersedes IntersectMBO/ouroboros-network#3800 and regards issue IntersectMBO/ouroboros-network#3159. I mostly just "rebased" the old `ouroboros-network` branch on top of this new repo. Please look at the discussions in the old PR for more details. This PR is co-authored-by: Marcin Szamotulski <coot@coot.me> @coot
Address #3159 on the
ouroboros-consensus
side.Checklist
interface-CHANGELOG.md
interface-CHANGELOG.md