-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable block production dynamically #3159
Enable block production dynamically #3159
Comments
This sounds reasonable. Though the choice of Also, is sending it a signal that much more convenient than restarting the backup non-BP node with a slightly altered config file/argument that enables BP? IE is it sufficiently more convenient to justify the additional complexity in the implementation? Signals are usually a pain. (Slippery slope though: will we also want to be able to similarly stop the block producer, eg once the primary one is back online?) Edit: by "signals are usuall a pain" I meant to properly receive and handle, not to send. |
Signals are very simple to send. SIGHUP is used in many other processes to mean "reload your config". For example, prometheus does this.
|
As Andrew mentions it is common for unix daemon's to support reloading their config files upon receiving |
Thanks @AndrewWestberg @karknu for elaborating.
@karknu @coot Would you update this Issue title and description according to this clarification ^^^? That's quite a bit more general than the current wording, if I understand correctly. Thanks. My immediate thought regarding "have the Consensus layer reload its config upon SIGHUP" is that a lot of our config is captured in a lot of closures, not (just) held in mutvars. Depending on which exact "config" we're thinking of reloading, this could task could range from relatively simple to invasive. EG Karl mentioned that the Network Layer only reloads the topology file -- maybe the necessary portion of the Consensus config for our immediate needs is indeed only stored in a mutvar; I'd have to get some more details and track that down. If the necessary config is curently captured in closures, then the shortest path might be the sledgehammer option of essentially restarting the necessary components of the node kernel (eg the block-producing threads). The less-sledgehammery option would be to migrate the config from closures to mutvars. Navigating those options depends on the details of what exactly we need to reload. Either option should probably involve a new tracer event. Also; Consensus doesn't read config files: I'm thinking out loud a bit, in the above, based on what I think this Issue is about. If what I discussed there seems unexpected, please let me know and we can talk about it at an upcoming Consensus Planning meeting (ie Tuesdays) to get on the same page. If we do seem to already be on the same page, then maybe you could suggest here which exact config needs to be reloaded -- that would help me get a better sense of what this PR might look like. Thanks. |
Our minimal requirement is to be able to start/stop block production on a signal (start is a must, stop is nice to have). It could be implemented as a mutable flag if reloading whole configuration is too big change. I am quite sure SPOs would be happy to be able to reload whole configuration this way.
What we do is: we put topology information in |
That's ok. |
This snippet is the existing logic that determines when the node attempts to forge a block. In particular, the @coot Would it be sufficient to put aside the reparsing of the config file and simply immediately abort each leadership check (ie |
@nfrisby I think we have three options on the table:
First, option is very intrusive, and its scope is probably too large to achieve in reasonable time frame. |
I don't anticipate it being simpler to implement. The main reason is that we have at least some closures that contain configuration data. We'd need to track all those down and replace them with The third option seems very simple to implement: add some new state and a related config option (command-line alone would likely be fine, I expect) with a very narrow intended use. I think the main downside of the third option is that if we eventually do support "reconfigure on |
You don't need The main downside point for third option, as I see it, are that:
While what I propose:
The downside is:
|
@coot I like it! I wasn't considering the new separate config. Way better than my hidden state idea. Heads up: @Jimbo4350 does the plan that Marcin outlined above ^^^ make sense to you? Does Node Team have any other config like this that we'd prefer to be able to change on the fly (without restarting the process)? I'm asking just to find overlap with existing tech debt/upcoming goals, etc. |
btw, we're using @deepfire are you planing to do reconfiguration of the logging system in a similar way? What's your take on the above plan. |
would it be possible to have an entry in the main config.json that is refering to a second file that is also loaded and merged with the main one on a SIGHUP signal? and a flag to disable blockproduction on a node via an config entry in the main or the linked config file? in that case it would be really easy to change either the small extra config file on the fly or to change the linked config file in the main config.json to point to a for example bp_enable.json config file which could also contain a reference to the node keys instead of passing it on via the starting parameters, and a bp_disable.json which contains a set parameter to disable bp production (should be on by default if keys are present). in that way we could not only enable/disable blockproduction but we could also change the provided keys if needed on the fly. something like:
for the main config and in the referenced {
"DisableLeaderCheck": false,
"VrfKeyFile": "mynode.vrf.skey",
"KesKeyFile": "mynode.kes.skey",
"OperationalCertificateFile": "mynode.node.opcert"
} or {
"DisableLeaderCheck": true,
"VrfKeyFile": "mynode.vrf.skey",
"KesKeyFile": "mynode.kes.skey",
"OperationalCertificateFile": "mynode.node.opcert"
} merging these together so the configuration can took place in the single main config file or in an additional one. and of course some flag that is checkable via the metrics to verify the current set blockproduction state of the running node. or name the ... just some ideas |
@gitmachtl We already have something like this which is the bulk credentials file. It's used mostly in setting up test nodes, but could be modified to add the extra flag for DisableLeaderCheck. |
you're right, i think i have used it once in the past but totally forgot about it. yep, maybe we could bring it together with some extra flags. |
It might need to just be renamed as "--credentials-file" and modified to make it more user friendly. Right now, it's very tricky to set up with the array of arrays and the order needing to be very specific as 1.) opcert, 2.) vrf, 3.) KES. Ideally a better format would be to just use exactly what you have above except wrap it in an array so it could still be used for bulk credentials in test environments. Then there would be exactly one way to configure credentials for a node instead of two.
|
Not to my knowledge
This sounds reasonable |
I think a good way to do this is to remove the forging credentials from the consensus config. So then forging credentials have nothing to do with the consensus layer's notion of init/startup. Then we'd add a consensus layer API to be able to dynamically provide forging credentials and start block forging, and indeed to remove & stop forging too. Initially this API would only be used at node startup time by the node top level code, but then it could also be exposed to be able to set/unset it at runtime (triggered by SIGHUP or something else). Ultimately we'll have an IPC mechanism to set the forging credentials. |
I renamed the ticket, since it might use a different mechanism than |
Would it be possible to have two signals? One that enables BlockProduction and one that disables BlockProduction? StartUp defaults to enabled. Or via a http call on the Prometheus interface? |
We understand that signals are not ideal for this scenario and eventually the |
This PR supersedes IntersectMBO/ouroboros-network#3800 and regards issue IntersectMBO/ouroboros-network#3159. I mostly just "rebased" the old `ouroboros-network` branch on top of this new repo. Please look at the discussions in the old PR for more details. This PR is co-authored-by: Marcin Szamotulski <coot@coot.me> @coot
For redunancy reasons SPOs are running backup block production nodes.
Currently, with a firewall rules, they can prevent relays to connect to the
backup node, and thus prevent the duplicate blocks to be diffused. This will
no longer work with p2p nodes, as relays will be able to reuse inbound
connection from the block producer. For this reason we would like a block
producing node to be able to start without being able to produce blocks and
start when it receives a SIGHUP signal.
This feature should be implemented against
p2p-master
branches (both inouroboros-network
andcardano-node
).The text was updated successfully, but these errors were encountered: