Enable block production dynamically #3159

coot · 2021-05-17T18:44:46Z

For redunancy reasons SPOs are running backup block production nodes.
Currently, with a firewall rules, they can prevent relays to connect to the
backup node, and thus prevent the duplicate blocks to be diffused. This will
no longer work with p2p nodes, as relays will be able to reuse inbound
connection from the block producer. For this reason we would like a block
producing node to be able to start without being able to produce blocks and
start when it receives a SIGHUP signal.

This feature should be implemented against p2p-master branches (both in
ouroboros-network and cardano-node).

The text was updated successfully, but these errors were encountered:

nfrisby · 2021-05-27T00:45:48Z

This sounds reasonable. Though the choice of SIGHUP seems a bit odd to me; maybe a USR* instead? (Disclaimer: I'm unaware of any conventions here, just "hangup" sounds wrong here.)

Also, is sending it a signal that much more convenient than restarting the backup non-BP node with a slightly altered config file/argument that enables BP? IE is it sufficiently more convenient to justify the additional complexity in the implementation? Signals are usually a pain. (Slippery slope though: will we also want to be able to similarly stop the block producer, eg once the primary one is back online?)

Edit: by "signals are usuall a pain" I meant to properly receive and handle, not to send.

AndrewWestberg · 2021-05-27T16:48:31Z

Signals are very simple to send. SIGHUP is used in many other processes to mean "reload your config". For example, prometheus does this.

kill -SIGHUP <pid>

karknu · 2021-05-27T16:50:36Z

As Andrew mentions it is common for unix daemon's to support reloading their config files upon receiving SIGHUP. For p2p we've added support for reloading the topology file. What we're after here is a reloading a config file, not simply switching between block production enabled/disabled upon receiving the signal.

nfrisby · 2021-05-27T22:55:00Z

Thanks @AndrewWestberg @karknu for elaborating.

What we're after here is a reloading a config file, not simply switching between block production enabled/disabled upon receiving the signal.

@karknu @coot Would you update this Issue title and description according to this clarification ^^^? That's quite a bit more general than the current wording, if I understand correctly. Thanks.

My immediate thought regarding "have the Consensus layer reload its config upon SIGHUP" is that a lot of our config is captured in a lot of closures, not (just) held in mutvars. Depending on which exact "config" we're thinking of reloading, this could task could range from relatively simple to invasive. EG Karl mentioned that the Network Layer only reloads the topology file -- maybe the necessary portion of the Consensus config for our immediate needs is indeed only stored in a mutvar; I'd have to get some more details and track that down. If the necessary config is curently captured in closures, then the shortest path might be the sledgehammer option of essentially restarting the necessary components of the node kernel (eg the block-producing threads). The less-sledgehammery option would be to migrate the config from closures to mutvars. Navigating those options depends on the details of what exactly we need to reload.

Either option should probably involve a new tracer event.

Also; Consensus doesn't read config files: cardano-node does that before invoking our layer. So the ultimate PR here will probably be adding an updateFooConfigToNewValue method to our interface, so that node can catch SIGHUP, reparse the file, and then invoke that new method.

I'm thinking out loud a bit, in the above, based on what I think this Issue is about. If what I discussed there seems unexpected, please let me know and we can talk about it at an upcoming Consensus Planning meeting (ie Tuesdays) to get on the same page.

If we do seem to already be on the same page, then maybe you could suggest here which exact config needs to be reloaded -- that would help me get a better sense of what this PR might look like.

Thanks.

coot · 2021-05-28T09:06:06Z

Our minimal requirement is to be able to start/stop block production on a signal (start is a must, stop is nice to have). It could be implemented as a mutable flag if reloading whole configuration is too big change. I am quite sure SPOs would be happy to be able to reload whole configuration this way.

Also; Consensus doesn't read config files: cardano-node does that before invoking our layer. So the ultimate PR here will probably be adding an updateFooConfigToNewValue method to our interface, so that node can catch SIGHUP, reparse the file, and then invoke that new method.

What we do is: we put topology information in TVar Topology in cardano-node, which then passes STM Topology down to diffusion: node is the only writer of this TVar and some network components are the only readers of it.

nfrisby · 2021-06-15T13:22:18Z

@coot @karknu We're planning to prioritize Genesis work over this. This should be simpler, and they both unblock the same things, at a coarse level.

So we probably won't touch this until some time in July. Does that sound viable for you?

coot · 2021-06-15T16:39:56Z

That's ok.

nfrisby · 2021-12-07T16:34:10Z

This snippet is the existing logic that determines when the node attempts to forge a block. In particular, the knownSlotWatcher means that we attempt every time the current slot changes (if it's known -- which ~means we're sufficiently synced with mainnet).

https://github.com/input-output-hk/ouroboros-network/blob/7a56e888428d76ba5598641aac201b0cd7f5121e/ouroboros-consensus/src/Ouroboros/Consensus/NodeKernel.hs#L406-L418

@coot Would it be sufficient to put aside the reparsing of the config file and simply immediately abort each leadership check (ie go above) if a mutable flag that receiving SIGHUP toggles is set to false? Would the node additionally need a way to specify whether the flag should start in the off or on position?

coot · 2021-12-08T05:10:24Z

@nfrisby I think we have three options on the table:

re-read whole configuration (on SIGHUP), and make it available to the rest of the application
re-read the configuration (on SIGHUP), but only extract things that we are interested in (right now a single boolean flag)
to have a controlling mini-protocol which allows to read / change specific values

First, option is very intrusive, and its scope is probably too large to achieve in reasonable time frame.
The second option, would fit nicely with what you suggest, and it would have a nicer interface, where the value is read from the configuration file, rather than be a switch which state is hidden.
I think I prefer the second option over the third one in terms of ux for SPOs, and it's also simpler to implement.

nfrisby · 2021-12-09T15:59:53Z

I think I prefer the second option over the third one in terms of ux for SPOs, and it's also simpler to implement.

I don't anticipate it being simpler to implement. The main reason is that we have at least some closures that contain configuration data. We'd need to track all those down and replace them with STM m Config actions.

The third option seems very simple to implement: add some new state and a related config option (command-line alone would likely be fine, I expect) with a very narrow intended use.

I think the main downside of the third option is that if we eventually do support "reconfigure on SIGHUP", then this flag (and its eg command-line option) would have to be deprecated.

coot · 2021-12-09T16:17:27Z

I don't anticipate it being simpler to implement. The main reason is that we have at least some closures that contain configuration data. We'd need to track all those down and replace them with STM m Config actions.

You don't need STM m Config, just STM m ReconfigurableConfig, where ReconfigurableConfig would just contain a small part of the real configuration, for now it could be just a single boolean value. But the point is that on SIGHUP, the node would parse the config file, and extract ReconfigurableConfig and made it available in the stm transaction.

The main downside point for third option, as I see it, are that:

on / off switch becomes error prone for the operator: the state becomes virtual, lives only in cardano-node process, one cannot double check the value. Sending SIGHUP will change the state every time.
config file is not the only source of configuration.

While what I propose:

keeps the simplicity of the third point, one only exposes tiny bits from the config file.
SIGHUP is idempotent: one can double check if the config file contains the right value, and send SIGHUP to make sure the in memory value is synced with the config value.
allows gradually expand ReconfigurableConfig, until all sensible parts of it are reconfigurable. We could extend it with other network parts, other teams (consensus :)) could follow in their pace

The downside is:

for the users it might be confusing which pieces are reconfigurable on SIGHUP, and which aren't. Good logging should help.
anything else?

nfrisby · 2021-12-09T18:02:46Z

@coot I like it! I wasn't considering the new separate config. Way better than my hidden state idea.

Heads up: @Jimbo4350 does the plan that Marcin outlined above ^^^ make sense to you? Does Node Team have any other config like this that we'd prefer to be able to change on the fly (without restarting the process)? I'm asking just to find overlap with existing tech debt/upcoming goals, etc.

coot · 2021-12-09T20:47:48Z

btw, we're using SIGHUP to reload p2p topology configuration.

@deepfire are you planing to do reconfiguration of the logging system in a similar way? What's your take on the above plan.

gitmachtl · 2021-12-11T09:46:45Z

would it be possible to have an entry in the main config.json that is refering to a second file that is also loaded and merged with the main one on a SIGHUP signal? and a flag to disable blockproduction on a node via an config entry in the main or the linked config file? in that case it would be really easy to change either the small extra config file on the fly or to change the linked config file in the main config.json to point to a for example bp_enable.json config file which could also contain a reference to the node keys instead of passing it on via the starting parameters, and a bp_disable.json which contains a set parameter to disable bp production (should be on by default if keys are present). in that way we could not only enable/disable blockproduction but we could also change the provided keys if needed on the fly.

something like:

{ ...
   "AdditionalConfigFile": "enable_bp.json",
or
   "AdditionalConfigFile": "disable_bp.json",
... }

for the main config

and in the referenced enable_bp.json like:

{
   "DisableLeaderCheck": false,
   "VrfKeyFile": "mynode.vrf.skey",
   "KesKeyFile": "mynode.kes.skey",
   "OperationalCertificateFile": "mynode.node.opcert"
}

or disable_bp.json like:

{
   "DisableLeaderCheck": true,
   "VrfKeyFile": "mynode.vrf.skey",
   "KesKeyFile": "mynode.kes.skey",
   "OperationalCertificateFile": "mynode.node.opcert"
}

merging these together so the configuration can took place in the single main config file or in an additional one.

and of course some flag that is checkable via the metrics to verify the current set blockproduction state of the running node.

or name the AdditionalConfigFile as ReconfigurableConfigFile for better visability to only allow config updates within this extra config file because it must have a fixed name that cannot be changed after the initial node start.

... just some ideas

AndrewWestberg · 2021-12-12T00:12:49Z

@gitmachtl We already have something like this which is the bulk credentials file. It's used mostly in setting up test nodes, but could be modified to add the extra flag for DisableLeaderCheck.

gitmachtl · 2021-12-12T10:28:30Z

@gitmachtl We already have something like this which is the bulk credentials file. It's used mostly in setting up test nodes, but could be modified to add the extra flag for DisableLeaderCheck.

you're right, i think i have used it once in the past but totally forgot about it. yep, maybe we could bring it together with some extra flags.

AndrewWestberg · 2021-12-12T13:24:34Z

It might need to just be renamed as "--credentials-file" and modified to make it more user friendly. Right now, it's very tricky to set up with the array of arrays and the order needing to be very specific as 1.) opcert, 2.) vrf, 3.) KES.

Ideally a better format would be to just use exactly what you have above except wrap it in an array so it could still be used for bulk credentials in test environments. Then there would be exactly one way to configure credentials for a node instead of two.

[
  {
     "DisableLeaderCheck": true,
     "VrfKeyFile": "mynode.vrf.skey",
     "KesKeyFile": "mynode.kes.skey",
     "OperationalCertificateFile": "mynode.node.opcert"
  }
]

Jimbo4350 · 2021-12-14T15:21:51Z

Does Node Team have any other config like this that we'd prefer to be able to change on the fly (without restarting the process)?

Not to my knowledge

re-read the configuration (on SIGHUP), but only extract things that we are interested in

This sounds reasonable

dcoutts · 2022-04-05T14:23:33Z

I think a good way to do this is to remove the forging credentials from the consensus config. So then forging credentials have nothing to do with the consensus layer's notion of init/startup. Then we'd add a consensus layer API to be able to dynamically provide forging credentials and start block forging, and indeed to remove & stop forging too. Initially this API would only be used at node startup time by the node top level code, but then it could also be exposed to be able to set/unset it at runtime (triggered by SIGHUP or something else). Ultimately we'll have an IPC mechanism to set the forging credentials.

coot · 2022-04-13T12:36:26Z

I renamed the ticket, since it might use a different mechanism than SIGHUP. From our perspective it's just an implementation detail.

gitmachtl · 2022-04-13T14:10:24Z

Would it be possible to have two signals? One that enables BlockProduction and one that disables BlockProduction? StartUp defaults to enabled. Or via a http call on the Prometheus interface?

coot · 2022-04-13T20:03:49Z

We understand that signals are not ideal for this scenario and eventually the cardano-node team will provide an IPC to load / unload keys to process memory; Although initially this might be done with a SIGHUP while the key is loaded in process memory (through configuration as it's done now).

@coot

This PR supersedes IntersectMBO/ouroboros-network#3800 and regards issue IntersectMBO/ouroboros-network#3159. I mostly just "rebased" the old `ouroboros-network` branch on top of this new repo. Please look at the discussions in the old PR for more details. This PR is co-authored-by: Marcin Szamotulski <coot@coot.me> @coot

coot added the peer2peer label May 17, 2021

coot assigned nfrisby May 17, 2021

nfrisby added the consensus issues related to ouroboros-consensus label May 27, 2021

coot changed the title ~~Enable block production on SIGHUP~~ Enable block production dynamically Apr 13, 2022

coot self-assigned this Jun 8, 2022

This was referenced Jun 8, 2022

Control block forging through NodeKernel #3800

Closed

Dynamic block forging for the P2P mode IntersectMBO/cardano-node#4024

Closed

coot linked a pull request Jun 29, 2022 that will close this issue

Control block forging through NodeKernel #3800

Closed

9 tasks

coot mentioned this issue Jul 11, 2022

P2P - Block Producer #3887

Closed

6 tasks

bolt12 mentioned this issue Jun 7, 2023

Control block forging through NodeKernel IntersectMBO/ouroboros-consensus#140

Merged

bolt12 self-assigned this Jun 12, 2023

bolt12 closed this as completed in IntersectMBO/ouroboros-consensus#140 Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable block production dynamically #3159

Enable block production dynamically #3159

coot commented May 17, 2021

nfrisby commented May 27, 2021 •

edited

Loading

AndrewWestberg commented May 27, 2021

karknu commented May 27, 2021

nfrisby commented May 27, 2021

coot commented May 28, 2021

nfrisby commented Jun 15, 2021

coot commented Jun 15, 2021

nfrisby commented Dec 7, 2021

coot commented Dec 8, 2021

nfrisby commented Dec 9, 2021

coot commented Dec 9, 2021

nfrisby commented Dec 9, 2021 •

edited

Loading

coot commented Dec 9, 2021

gitmachtl commented Dec 11, 2021 •

edited

Loading

AndrewWestberg commented Dec 12, 2021

gitmachtl commented Dec 12, 2021

AndrewWestberg commented Dec 12, 2021

Jimbo4350 commented Dec 14, 2021 •

edited

Loading

dcoutts commented Apr 5, 2022

coot commented Apr 13, 2022

gitmachtl commented Apr 13, 2022 •

edited

Loading

coot commented Apr 13, 2022

Enable block production dynamically #3159

Enable block production dynamically #3159

Comments

coot commented May 17, 2021

nfrisby commented May 27, 2021 • edited Loading

AndrewWestberg commented May 27, 2021

karknu commented May 27, 2021

nfrisby commented May 27, 2021

coot commented May 28, 2021

nfrisby commented Jun 15, 2021

coot commented Jun 15, 2021

nfrisby commented Dec 7, 2021

coot commented Dec 8, 2021

nfrisby commented Dec 9, 2021

coot commented Dec 9, 2021

nfrisby commented Dec 9, 2021 • edited Loading

coot commented Dec 9, 2021

gitmachtl commented Dec 11, 2021 • edited Loading

AndrewWestberg commented Dec 12, 2021

gitmachtl commented Dec 12, 2021

AndrewWestberg commented Dec 12, 2021

Jimbo4350 commented Dec 14, 2021 • edited Loading

dcoutts commented Apr 5, 2022

coot commented Apr 13, 2022

gitmachtl commented Apr 13, 2022 • edited Loading

coot commented Apr 13, 2022

nfrisby commented May 27, 2021 •

edited

Loading

nfrisby commented Dec 9, 2021 •

edited

Loading

gitmachtl commented Dec 11, 2021 •

edited

Loading

Jimbo4350 commented Dec 14, 2021 •

edited

Loading

gitmachtl commented Apr 13, 2022 •

edited

Loading