-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: horizontal scalability #759
Conversation
e8eebe3
to
0b00da2
Compare
0b00da2
to
9be76f9
Compare
@@ -1046,6 +1047,7 @@ def _create_activation_instance(self): | |||
self._set_activation_status(ActivationStatus.PENDING, msg) | |||
raise exceptions.MaxRunningProcessesError | |||
args = { | |||
"worker": os.environ["HOSTNAME"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benthomasson If there are 5 activation workers running on a node, is this specifying a single worker or all workers on that node? Is this node/host name as opposed to worker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be any value. I am just using hostname right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be multiple workers within a node. This is our current setup and we want it for concurrency and redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a node name, worker name, or worker group name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple workers on a node would work fine. They would monitor the rulebooks that they started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a worker is lost we need to hand off monitoring to another worker and that is not yet done.
@@ -317,10 +317,6 @@ def _get_secret_key() -> str: | |||
|
|||
RQ_STARTUP_JOBS = [] | |||
RQ_PERIODIC_JOBS = [ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benthomasson If this gets removed how would the periodic updates happen? The activations run as detached process and have no affinity to a worker they have affinity to the node where it is running. Any worker on that node can get the logs/status from an activation and take actions like stop/restart etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actions that are not yet implemented in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Monitor processes run instead of the RQ workers on the nodes. A monitor process can monitor any number of activations.
Closed in favor of #701 |
This PR allows for running rulebooks on multiple worker nodes.
It does this by tagging the rulebook process with the worker that started it and then the monitor process runs periodically to update just the rulebook processes that it has local access to.
This PR works with any number of worker nodes and worker nodes can be dynamically added and removed.
I am still working on the scenario of when a worker node is removed while a rulebook process is running.