Control block forging through NodeKernel #3800

coot · 2022-06-08T15:42:31Z

Address #3159 on the ouroboros-consensus side.

Fixed a typo
Added setBlockForging to NodeKernel
Updated ouroboros-consensus-test
Updated ouroboros-consensus-mock
Updated ouroboros-consensus-mock-test
Updated ouroboros-consensus-byron
Updated ouroboros-consensus-byron-test
Updated ouroboros-consensus-shelley
Updated ouroboros-consensus-shelley-test
Updated ouroboros-consensus-cardano library
Updated ouroboros-consensus-cardano:db-analyzer
Updated ouroboros-consensus-cardano-test

Checklist

Branch
- Commit sequence broadly makes sense
- Commits have useful messages
- New tests are added if needed and existing tests are updated
- If this branch changes Consensus and has any consequences for downstream repositories or end users, said changes must be documented in interface-CHANGELOG.md
- If this branch changes Network and has any consequences for downstream repositories or end users, said changes must be documented in interface-CHANGELOG.md
- If serialization changes, user-facing consequences (e.g. replay from genesis) are confirmed to be intentional.
Pull Request
- Self-reviewed the diff
- Useful pull request description at least containing the following information:
  - What does this PR change?
  - Why these changes were needed?
  - How does this affect downstream repositories and/or end-users?
  - Which ticket does this PR close (if any)? If it does, is it linked?
- Reviewer requested

nfrisby

FYI: I only reviewed the Added setBlockForging to NodeKernel commit so far.

ouroboros-consensus/src/Ouroboros/Consensus/HardFork/Combinator/Embed/Binary.hs

ouroboros-consensus/src/Ouroboros/Consensus/Node/ProtocolInfo.hs

ouroboros-consensus/src/Ouroboros/Consensus/NodeKernel.hs

ouroboros-consensus/docs/interface-CHANGELOG.md

Marcin fixed the main concern

nfrisby

Thank you so much for all the heavy-lifting you've done here in the Consensus code! Your diff is really clean, and the effort that took is much appreciated 🙏.

I am Requesting Changes only because of the amount of copy-paste code I'm seeing; lots of "parameter projection" code was duplicated between the new protocolInfo* and blockForging* functions, which have been split apart from the old protocolInfo* functions (whose single definition was able to share the parameter projections about the two halves that are separated in the new code). See the bigger Conversation about it below.

ouroboros-consensus-byron-test/test/Test/ThreadNet/Byron.hs

ouroboros-consensus-cardano-test/test/Test/ThreadNet/Cardano.hs

ouroboros-consensus-mock/src/Ouroboros/Consensus/Mock/Node/Praos.hs

coot · 2022-06-14T13:22:34Z

I am getting some test failures:

`ouroboros-consensus-byron-test`

byron
  Byron
    simple convergence: FAIL (295.37s)
      *** Failed! Falsified (after 4 tests):
      TestSetup {setupEBBs = NoEBBs, setupK = SecurityParam 1, setupTestConfig = TestConfig {initSeed = Seed 4918591009071197805, nodeTopology = NodeTopology (fromList [(CoreNodeId 0,fromList []),(CoreNodeId 1,fromList [CoreNodeId 0]),(CoreNodeId 2,fromList [CoreNodeId 0,CoreNodeId 1]),(CoreNodeId 3,fromList [CoreNodeId 1])]), numCoreNodes = NumCoreNodes 4, numSlots = NumSlots 67}, setupNodeJoinPlan = NodeJoinPlan (fromList [(CoreNodeId 0,SlotNo 1),(CoreNodeId 1,SlotNo 1),(CoreNodeId 2,SlotNo 1),(CoreNodeId 3,SlotNo 35)]), setupNodeRestarts = NodeRestarts (fromList [(SlotNo 21,fromList [(CoreNodeId 1,NodeRestart)]),(SlotNo 25,fromList [(CoreNodeId 1,NodeRestart)]),(SlotNo 28,fromList [(CoreNodeId 0,NodeRestart)]),(SlotNo 29,fromList [(CoreNodeId 1,NodeRestart)]),(SlotNo 33,fromList [(CoreNodeId 1,NodeRestart)]),(SlotNo 34,fromList [(CoreNodeId 2,NodeRestart)]),(SlotNo 35,fromList [(CoreNodeId 3,NodeRekey)]),(SlotNo 37,fromList [(CoreNodeId 1,NodeRestart)]),(SlotNo 44,fromList [(CoreNodeId 0,NodeRestart)]),(SlotNo 45,fromList [(CoreNodeId 1,NodeRestart)]),(SlotNo 55,fromList [(CoreNodeId 3,NodeRestart)]),(SlotNo 56,fromList [(CoreNodeId 0,NodeRestart)]),(SlotNo 62,fromList [(CoreNodeId 2,NodeRestart)]),(SlotNo 63,fromList [(Cor

... 
     
      consensus expected: True
      maxForkLength: 0
      There were unexpected CannotForges: fromList [(SlotNo 55,[PBftCannotForgeInvalidDelegation (KeyHash {unKeyHash = 911965b94fe206522fe8fb1683abf0bd39d9f092e9c9553ddbe35ae0})]),(SlotNo 59,[PBftCannotForgeInvalidDelegation (KeyHash {unKeyHash = 911965b94fe206522fe8fb1683abf0bd39d9f092e9c9553ddbe35ae0})]),(SlotNo 63,[PBftCannotForgeInvalidDelegation (KeyHash {unKeyHash = 911965b94fe206522fe8fb1683abf0bd39d9f092e9c9553ddbe35ae0}),PBftCannotForgeInvalidDelegation (KeyHash {unKeyHash = 911965b94fe206522fe8fb1683abf0bd39d9f092e9c9553ddbe35ae0})])]
      Use --quickcheck-replay=254433 to reproduce.

It's interesting that the in the slot 55 the node was not able to produce a block, and that's exactly the slot at which the node 3 was scheduled to restart (if I interpret the TestSetup correctly).

The full ouroboros-consensus-byron-test log.

`ouroboros-consensus-cardano-test`

And also another one in ouroboros-consensus-cardano-test:

          Exception thrown while showing test case:
            Assertion failed
            CallStack (from HasCallStack):
              assert, called at src/Cardano/Crypto/KES/Mock.hs:98:9 in cardano-crypto-class-2.0.0-49b4213a40d8c0265da39810113c14e195218a76babd0ce512365b752abc1e6e:Cardano.Crypto.
KES.Mock
              signKES, called at src/Cardano/Crypto/KES/Class.hs:355:40 in cardano-crypto-class-2.0.0-49b4213a40d8c0265da39810113c14e195218a76babd0ce512365b752abc1e6e:Cardano.Cry
pto.KES.Class
          
          Use --quickcheck-replay=900463 to reproduce.
          Use -p '/SerialiseDisk.roundtrip Header/' to rerun this test only.

The last one is because this assertion failure.

The full ouroboros-consensus-cardano-test log.

ouroboros-consensus/src/Ouroboros/Consensus/NodeKernel.hs

ouroboros-consensus-cardano/src/Ouroboros/Consensus/Cardano/Node.hs

nfrisby · 2022-06-29T00:08:37Z

Ah ha! Hydra built green with my typo fixup! 🙌

Marcin fixed my concern

nfrisby · 2022-06-29T00:14:16Z

I'm mentally gassed at this point in my day. I'll do a last pass tomorrow during/after our call.

nfrisby · 2022-11-15T20:15:06Z

But I suspect we never re-open ChainDB, is this right?

That is right. Some tests do it, but not the implementation.

Block forging is removed from ProtocolInfo, and can controlled by using `NodeKernel` field: `setProtocolForging :: [BlockForging m blk] -> m ()`.

When make sure that when a block is added to the ChainDB, transactions will be removed from the mempool. The 'addBlockAsync' is a lightweight non-blocking operation but the finalizer is blocking (`blockProcessed` will block until the block was added to the ChainDB). Hence we need to use `uninterruptibleMask_` to make it safe in presence of asynchronous exceptions.

When the block forger thread adds a new block, the adding thread might be killed by an async exception. If that happens, the block forger will get 'Nothing' when `blockProcessed` returns, and it can exit.

* ouroboros-consensus-test * ouroboros-consensus-cardano-tools

`addBlock_` is used by `initNodeKernel` when calling the `initChainDB` callback from `NodeKernelArgs`.

ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/Types.hs

ouroboros-consensus/src/Ouroboros/Consensus/NodeKernel.hs

ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/Background.hs

nfrisby · 2022-11-21T19:34:53Z

ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/API.hs

@@ -437,7 +439,7 @@ addBlockWaitWrittenToDisk chainDB punish blk = do

 -- | Add a block synchronously: wait until the block has been processed (see
 -- 'blockProcessed'). The new tip of the ChainDB is returned.
-addBlock :: IOLike m => ChainDB m blk -> InvalidBlockPunishment m -> blk -> m (Point blk)
+addBlock :: IOLike m => ChainDB m blk -> InvalidBlockPunishment m -> blk -> m (Maybe (Point blk))


Looking more closely at nodeInitChainDB @ByronBlock --- which is ultimately the only interesting transitive use of API.addBlock (see previous message) --- we see it's merely adding the slot 0 Epoch Boundary Block.

https://github.com/input-output-hk/ouroboros-network/blob/72863b0fc78abdc2b8e29f0dda96c06da3dd11d0/ouroboros-consensus-byron/src/Ouroboros/Consensus/Byron/Node.hs#L273-L282

There's no way to recover, if that fails.

So: the only real use of addBlock in the system has no expectation of failure and no useful way to recover if it did fail.

nfrisby

Some style and a new problematic observation regarding concurrency :(

nfrisby · 2022-11-21T19:54:59Z

ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/Types.hs

@@ -492,6 +494,14 @@ addBlockToAdd tracer (BlocksToAdd queue) punish blk = do
 getBlockToAdd :: IOLike m => BlocksToAdd m blk -> m (BlockToAdd m blk)
 getBlockToAdd (BlocksToAdd queue) = atomically $ readTBQueue queue

+-- | Flush the 'BlocksToAdd' queue and notify the waiting threads.
+--
+closeBlocksToAdd :: IOLike m => BlocksToAdd m blk -> STM m ()


Bah; this doesn't seem enough. We have a race now, don't we?

Some threads are adding tasks to the queue; these are BlockFetch clients and the NodeKernel.hs forge.

One thread is popping one task from the queue at a time

If the popping thread is killed mid-pop, then it notifies the unlucky owner of the task that got interrupted. And it also flushes the queue, similarly notifying all other task owners. And then the popping thread will terminate, since bracket* re-raises the exception.

But there's no guarantee the popping thread will be the last to terminate. So even when it's gone, other threads may be adding to the queue.

The first option that comes to mind:

~~Complicate the queue by adding state indicating whether it's "open" or "closed" and have the addBlockRunner close the queue when it flushes it.~~ Hmm... the ChainDB already has an open/closed state; maybe the addBlockRunner dying is either reason enough to fully close the ChainDB or can only actually happen when the ChainDB is closed/closing or something like that?

~~But now addBlockAsync will also be partial, since you can't add to a closed queue.~~ Now addBlockAsync :: ... -> m (AddBlockPromise m blk) would create a degenerate "promise" (immediately filled with False and Nothing) when asked to add a block to a closed queue.

I didn't really grasp the problem here. Isn't the point of STM to be atomic, I don't think we'll get a Async exception mid-pop, but what can happen is the thread getting interrupted while blocked on waiting for popping. And if that's the case the cleanup handle won't even run.

@nfrisby is right, we need to make sure no new block can be accepted by the db after closeBlocksToAdd is called by addBlockRunner. @bolt12 it's not about the thread itself, it's about all the other concurrent writes to the ChainDB that are done by all block-fetch clients.

@nfrisby The exception will propagate and eventually ChainDB.closeDB will be called. It seems to me that we don't have access to the TVar which holds ChainDbState in the context of addBlockRunner, so keeping the state of the queue might be an easier option to implement (as in the suggestion you stroked).

bolt12

@nfrisby I am taking over this PR and I had some comments, could you spare some time to reply so I feel confident in making the changes needed?

Notice that I also need to rebase (eek!) this very old PR, but I figure addressing the issues first will be better

bolt12 · 2023-06-05T14:07:03Z

ouroboros-consensus/docs/interface-CHANGELOG.md

+
+- Added `setBlockForging` to `NodeKernel` which must be used to set / control
+  block forging of the consensus layer.
+- We removed the `pInfoBlockForging` record field from the `ProtocolInfo` type.


Worth mentioning that it got extracted rather than just plain removed

bolt12 · 2023-06-05T14:55:16Z

ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/API.hs

@@ -437,7 +439,7 @@ addBlockWaitWrittenToDisk chainDB punish blk = do

 -- | Add a block synchronously: wait until the block has been processed (see
 -- 'blockProcessed'). The new tip of the ChainDB is returned.
-addBlock :: IOLike m => ChainDB m blk -> InvalidBlockPunishment m -> blk -> m (Point blk)
+addBlock :: IOLike m => ChainDB m blk -> InvalidBlockPunishment m -> blk -> m (Maybe (Point blk))


I also think it is better to be explicit about the error in the type if possible, specially in this case where, at the call site, it is not obvious the reason one might get Nothing. So just to make sure I got it right, propagate a Maybe-isomorph type until Init.addBlock and have that function throw an exception, is that so?

bolt12 · 2023-06-05T15:19:13Z

ouroboros-consensus/src/Ouroboros/Consensus/Storage/ChainDB/Impl/Types.hs

@@ -492,6 +494,14 @@ addBlockToAdd tracer (BlocksToAdd queue) punish blk = do
 getBlockToAdd :: IOLike m => BlocksToAdd m blk -> m (BlockToAdd m blk)
 getBlockToAdd (BlocksToAdd queue) = atomically $ readTBQueue queue

+-- | Flush the 'BlocksToAdd' queue and notify the waiting threads.
+--
+closeBlocksToAdd :: IOLike m => BlocksToAdd m blk -> STM m ()


I didn't really grasp the problem here. Isn't the point of STM to be atomic, I don't think we'll get a Async exception mid-pop, but what can happen is the thread getting interrupted while blocked on waiting for popping. And if that's the case the cleanup handle won't even run.

bolt12 · 2023-06-07T14:00:44Z

Closed in favor of IntersectMBO/ouroboros-consensus#140

@coot

This PR supersedes IntersectMBO/ouroboros-network#3800 and regards issue IntersectMBO/ouroboros-network#3159. I mostly just "rebased" the old `ouroboros-network` branch on top of this new repo. Please look at the discussions in the old PR for more details. This PR is co-authored-by: Marcin Szamotulski <coot@coot.me> @coot

coot requested review from nfrisby and dnadales as code owners June 8, 2022 15:42

coot added the consensus issues related to ouroboros-consensus label Jun 8, 2022

nfrisby previously requested changes Jun 8, 2022

View reviewed changes

nfrisby reviewed Jun 8, 2022

View reviewed changes

ouroboros-consensus/docs/interface-CHANGELOG.md Outdated Show resolved Hide resolved

nfrisby reviewed Jun 8, 2022

View reviewed changes

ouroboros-consensus/docs/interface-CHANGELOG.md Outdated Show resolved Hide resolved

coot force-pushed the coot/dynamic-block-forrging branch 2 times, most recently from 5455841 to b9343a8 Compare June 8, 2022 19:26

coot requested a review from nfrisby June 8, 2022 19:26

coot force-pushed the coot/dynamic-block-forrging branch 4 times, most recently from 37a4cbf to 7fcff5c Compare June 9, 2022 13:50

nfrisby reviewed Jun 9, 2022

View reviewed changes

ouroboros-consensus/docs/interface-CHANGELOG.md Outdated Show resolved Hide resolved

coot force-pushed the coot/dynamic-block-forrging branch 2 times, most recently from a98a108 to 15da26e Compare June 10, 2022 07:04

coot mentioned this pull request Jun 13, 2022

Dynamic block forging for the P2P mode IntersectMBO/cardano-node#4024

Closed

coot force-pushed the coot/dynamic-block-forrging branch from 15da26e to d327246 Compare June 13, 2022 17:11

nfrisby previously requested changes Jun 13, 2022

View reviewed changes

ouroboros-consensus-byron-test/test/Test/ThreadNet/Byron.hs Outdated Show resolved Hide resolved

ouroboros-consensus-cardano-test/test/Test/ThreadNet/Cardano.hs Show resolved Hide resolved

ouroboros-consensus-mock/src/Ouroboros/Consensus/Mock/Node/Praos.hs Show resolved Hide resolved

coot force-pushed the coot/dynamic-block-forrging branch from d327246 to cfbd8ff Compare June 14, 2022 13:19

nfrisby reviewed Jun 15, 2022

View reviewed changes

ouroboros-consensus/src/Ouroboros/Consensus/NodeKernel.hs Show resolved Hide resolved

nfrisby reviewed Jun 15, 2022

View reviewed changes

ouroboros-consensus-cardano/src/Ouroboros/Consensus/Cardano/Node.hs Outdated Show resolved Hide resolved

coot force-pushed the coot/dynamic-block-forrging branch 2 times, most recently from 0748406 to 34c4e95 Compare June 16, 2022 19:58

coot changed the title ~~Control block forging throught NodeKernel~~ Control block forging through NodeKernel Jun 22, 2022

coot force-pushed the coot/dynamic-block-forrging branch 2 times, most recently from ac457b5 to 998242c Compare November 9, 2022 20:58

coot added 16 commits November 21, 2022 09:16

Added setBlockForging to NodeKernel

7fc8400

Block forging is removed from ProtocolInfo, and can controlled by using `NodeKernel` field: `setProtocolForging :: [BlockForging m blk] -> m ()`.

Updated ouroboros-consensus-test

bc8dde0

Updated ouroboros-consensus-mock

e27ef02

Updated ouroboros-consensus-mock-test

5efb284

Updated ouroboros-consensus-byron

7fb7b5c

Updated ouroboros-consensus-byron-test

81315eb

Updated ouroboros-consensus-shelley

363922b

Updated ouroboros-consensus-shelley-test

1a8323d

Updated ouroboros-consensus-cardano library

6b36280

Updated ouroboros-consensus-cardano:db-analyzer

38c9bd1

Updated ouroboros-consensus-cardano-test

8272447

Updated ouroboros-consensus CHANGELOG file.

5136720

block-forging: async exception safety

479a900

When the block forger thread adds a new block, the adding thread might be killed by an async exception. If that happens, the block forger will get 'Nothing' when `blockProcessed` returns, and it can exit.

block-forging: updated upstream dependencies

b62cc7a

* ouroboros-consensus-test * ouroboros-consensus-cardano-tools

addBlock, addBlock_ shouldn't be partial functions

c102d99

`addBlock_` is used by `initNodeKernel` when calling the `initChainDB` callback from `NodeKernelArgs`.

coot force-pushed the coot/dynamic-block-forrging branch from 998242c to c102d99 Compare November 21, 2022 08:18

nfrisby reviewed Nov 21, 2022

View reviewed changes

nfrisby requested changes Nov 21, 2022

View reviewed changes

coot marked this pull request as draft May 18, 2023 07:12

bolt12 reviewed Jun 5, 2023

View reviewed changes

bolt12 requested a review from nfrisby June 5, 2023 15:22

bolt12 self-assigned this Jun 5, 2023

bolt12 mentioned this pull request Jun 7, 2023

Control block forging through NodeKernel IntersectMBO/ouroboros-consensus#140

Merged

bolt12 closed this Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control block forging through NodeKernel #3800

Control block forging through NodeKernel #3800

coot commented Jun 8, 2022 •

edited

Loading

nfrisby left a comment

nfrisby left a comment

coot commented Jun 14, 2022 •

edited

Loading

nfrisby commented Jun 29, 2022 •

edited

Loading

nfrisby commented Jun 29, 2022

nfrisby commented Nov 15, 2022

nfrisby Nov 21, 2022

nfrisby left a comment

nfrisby Nov 21, 2022

bolt12 Jun 5, 2023

coot Jun 14, 2023

bolt12 left a comment

bolt12 Jun 5, 2023

bolt12 Jun 5, 2023

bolt12 Jun 5, 2023

bolt12 commented Jun 7, 2023

Control block forging through NodeKernel #3800

Control block forging through NodeKernel #3800

Conversation

coot commented Jun 8, 2022 • edited Loading

Checklist

nfrisby left a comment

Choose a reason for hiding this comment

nfrisby left a comment

Choose a reason for hiding this comment

coot commented Jun 14, 2022 • edited Loading

ouroboros-consensus-byron-test

ouroboros-consensus-cardano-test

nfrisby commented Jun 29, 2022 • edited Loading

nfrisby commented Jun 29, 2022

nfrisby commented Nov 15, 2022

nfrisby Nov 21, 2022

Choose a reason for hiding this comment

nfrisby left a comment

Choose a reason for hiding this comment

nfrisby Nov 21, 2022

Choose a reason for hiding this comment

bolt12 Jun 5, 2023

Choose a reason for hiding this comment

coot Jun 14, 2023

Choose a reason for hiding this comment

bolt12 left a comment

Choose a reason for hiding this comment

bolt12 Jun 5, 2023

Choose a reason for hiding this comment

bolt12 Jun 5, 2023

Choose a reason for hiding this comment

bolt12 Jun 5, 2023

Choose a reason for hiding this comment

bolt12 commented Jun 7, 2023

coot commented Jun 8, 2022 •

edited

Loading

coot commented Jun 14, 2022 •

edited

Loading

`ouroboros-consensus-byron-test`

`ouroboros-consensus-cardano-test`

nfrisby commented Jun 29, 2022 •

edited

Loading