feat: horizontal scalability #759

benthomasson · 2024-03-19T12:52:01Z

This PR allows for running rulebooks on multiple worker nodes.

It does this by tagging the rulebook process with the worker that started it and then the monitor process runs periodically to update just the rulebook processes that it has local access to.

This PR works with any number of worker nodes and worker nodes can be dynamically added and removed.

I am still working on the scenario of when a worker node is removed while a rulebook process is running.

mkanoor · 2024-03-19T13:14:03Z

src/aap_eda/services/activation/manager.py

@@ -1046,6 +1047,7 @@ def _create_activation_instance(self):
            self._set_activation_status(ActivationStatus.PENDING, msg)
            raise exceptions.MaxRunningProcessesError
        args = {
+            "worker": os.environ["HOSTNAME"],


@benthomasson If there are 5 activation workers running on a node, is this specifying a single worker or all workers on that node? Is this node/host name as opposed to worker?

This could be any value. I am just using hostname right now.

There will be multiple workers within a node. This is our current setup and we want it for concurrency and redundancy.

This could be a node name, worker name, or worker group name.

Multiple workers on a node would work fine. They would monitor the rulebooks that they started.

If a worker is lost we need to hand off monitoring to another worker and that is not yet done.

mkanoor · 2024-03-19T13:16:52Z

src/aap_eda/settings/default.py

@@ -317,10 +317,6 @@ def _get_secret_key() -> str:

 RQ_STARTUP_JOBS = []
 RQ_PERIODIC_JOBS = [
-    {


@benthomasson If this gets removed how would the periodic updates happen? The activations run as detached process and have no affinity to a worker they have affinity to the node where it is running. Any worker on that node can get the logs/status from an activation and take actions like stop/restart etc.

There are actions that are not yet implemented in this PR.

Monitor processes run instead of the RQ workers on the nodes. A monitor process can monitor any number of activations.

benthomasson · 2024-04-16T14:55:45Z

Closed in favor of #701

benthomasson requested a review from a team as a code owner March 19, 2024 12:52

benthomasson marked this pull request as draft March 19, 2024 12:52

benthomasson force-pushed the horizontal_scaling branch 3 times, most recently from e8eebe3 to 0b00da2 Compare March 19, 2024 13:03

feat: horizontal scalability

9be76f9

benthomasson force-pushed the horizontal_scaling branch from 0b00da2 to 9be76f9 Compare March 19, 2024 13:04

mkanoor reviewed Mar 19, 2024

View reviewed changes

mkanoor mentioned this pull request Mar 19, 2024

Enable podman multinode #701

Merged

benthomasson added 9 commits March 20, 2024 13:31

Remove stop, delete, restart from activation queue

c7a5560

Move system_restart_activation to orchestrator

6f30f30

Add process_requests

41f672a

Update monitor_forever

e005841

Move process_requests into monitor_forever

67c2c3f

fix

40afead

Remove process_requests

895a050

Add find lost activations

deadfcc

Use enums

bec461f

benthomasson closed this Apr 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: horizontal scalability #759

feat: horizontal scalability #759

benthomasson commented Mar 19, 2024 •

edited

Loading

mkanoor Mar 19, 2024 •

edited

Loading

benthomasson Mar 19, 2024

Alex-Izquierdo Mar 19, 2024

benthomasson Mar 19, 2024

benthomasson Mar 19, 2024

benthomasson Mar 19, 2024

mkanoor Mar 19, 2024

benthomasson Mar 19, 2024 •

edited

Loading

benthomasson Mar 20, 2024

benthomasson commented Apr 16, 2024

feat: horizontal scalability #759

feat: horizontal scalability #759

Conversation

benthomasson commented Mar 19, 2024 • edited Loading

mkanoor Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

benthomasson Mar 19, 2024

Choose a reason for hiding this comment

Alex-Izquierdo Mar 19, 2024

Choose a reason for hiding this comment

benthomasson Mar 19, 2024

Choose a reason for hiding this comment

benthomasson Mar 19, 2024

Choose a reason for hiding this comment

benthomasson Mar 19, 2024

Choose a reason for hiding this comment

mkanoor Mar 19, 2024

Choose a reason for hiding this comment

benthomasson Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

benthomasson Mar 20, 2024

Choose a reason for hiding this comment

benthomasson commented Apr 16, 2024

benthomasson commented Mar 19, 2024 •

edited

Loading

mkanoor Mar 19, 2024 •

edited

Loading

benthomasson Mar 19, 2024 •

edited

Loading