Minimal level-triggered controller #200

squaremo · 2023-10-04T11:13:18Z

This does the preliminary work requested in #194, so that the implementation of level-triggered pipelines can make progress in several (or at least two) directions. After this is merged, we have a controller that checks the status of its targets, so the promotion algorithm can be implemented (#195).

This does just (about) enough to build the promotion algorithm on top, and thereby get to a usable controller. I made simplifications to get here reasonably quickly:

only HelmRelease objects are supported
only targets in the local cluster (those without a .clusterRef) are supported

Removing those limitations is the work of #196.

I put the new implementation behind a feature flag, so we can work on it without having long-running branches. For many of the subsequent tasks in the umbrella issue #193 it will be enough to work in controllers/leveltriggered/ and write tests there. When we want to work on GUI or other things needing a running controller process, the flag --enable-level-triggered can be given to the binary to start the new controller code.

Fixes #194. With reference to this criterion: "objects that aren't mentioned in a pipeline don't get watched" -- this implementation limits support to a specific type, and the local cluster, and uses the convenience builder API in controller-runtime, so avoids creating informers and watchers and so on dynamically. The expanded implementation (#196) will need to pay careful attention to this, though.

yitsushi

Looks good to me, only the module placement of the new controller.

When/if we drop the original logic, make sure we move the controller under controlers.

Or even better if we move it now and rename it controllers.NewLevelTriggeringPipelineReconciler. It's a common code structure to store controllers under controllers and it bothers me to see the new one under a different module. controllers directory is plural.

api/v1alpha1/pipeline_types.go

squaremo · 2023-10-05T17:02:19Z

When/if we drop the original logic, make sure we move the controller under controlers.

Or even better if we move it now and rename it controllers.NewLevelTriggeringPipelineReconciler. It's a common code structure to store controllers under controllers and it bothers me to see the new one under a different module. controllers directory is plural.

We can move the code when the old one is fully deprecated, yep.

The forked controller went in its own package mainly because the tests rely on a TestMain which sets the controller running, and it is easier and cleaner if the two controller implementations have their own TestMain.

If the implementations being in controllers/ and a subdirectory of controllers/ is the objectionable bit, I could move the old one to e.g., controllers/legacy. It's pretty common to see subdirectories for different controllers that run in the same manager; this is a different situation but the pattern seems about right.

luizbafilho

This looks great as a basis for making progress. Is there anything else we can deal with it later.

We want to react to clusters changing, especially if they are going from an unready state to a ready state. This fixes some small but egregious problems with how that works: - the helper funcs had misleading names - the key, though an arbitrary string, was misleading - the index lookup did some unnecessary work in compiling a list of deep copies of query results Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

As it stood, the controller logic would send an event with reason="SetStatusConditionError" when the cluster was not found and setting the status _succeeded_. This commit rearranges this logic so that an event with reason=GetClusterError is sent if there's any problem resolving the clusterRef, and reason=SetStatusConditionError is only sent if the status condition could not be set. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

This adds checks to the pipeline test to make sure that a pipeline will be re-examined when a cluster it refers to is 1. created and 2. reaches a ready state. This relies on the indexing working, since the test timeout is much shorter than the default requeue time (after which the pipeline would be re-examined anyway). Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

I want the new behaviour to be guarded by a feature flag, and rather than sprinkling conditionals through the controller code, I'd prefer to have a completely separate set of code and dispatch to it in main.go. A reasonable starting point is the pretty minimal existing controller code. So: copy the controller code into a new package and adjust as needed to get go test -v ./controllers/leveltriggered to succeed. This commit also makes the test main wait for the manager to shut down. This is minorly superstitious on my part: my experience is that failure to shut managers down cleanly can cause problems with subsequent test runs. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

The level-triggered promotion algorithm will need to know the status of each target in the pipeline, so it can make a decision on if and where a promotion needs to be triggered. To be explicit about how it makes that decision, that controller should record the status it's seen for each target. This will help with testing, and makes the Pipeline object self-contained -- you should be able to see from the status where the pipeline is up to. This commit introduces a couple more types into the API to represent the status of a target. Each one gets a full name (the type, namespace, name, and cluster if it's in another cluster). When the controller goes through the environments, it fetches the targets apps. For the sake of not making too many decisions at once, I've implemented only the case of HelmRelease objects in the local cluster. Once there are tests demonstrating it works, it can be generalised to other app types (the API accepts _any_ API group, version, and kind). The test involving a GitOpsCluster is skipped, because we would now expect to fetch the application object from the remote cluster it represents, rather than just examine the cluster object itself. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

To date, the controller didn't have much to do other than validate that it could see the cluster objects mentioned in a pipeline object (which wasn't itself necessary, since taking actions relied on the information given in webhooks). Its idea of "Ready" was that it could see all the clusters, and they were themselves ready. If any cluster was not present, it could exit early as unready. Now, however, we want to actually go and fetch the status of the targets mentioned in the pipeline. "Ready" must have a different meaning: it means the controller can reach all the targets, and thereby determine the state of play for the pipeline. It could exit early in principle, but I have preferred to assess all the targets even if some are unavailable, and give a summary in the ready condition at the end. The additional test will fail for now, since it checks that adding a previously missing app object will result in a ready pipeline, but there's nothing to trigger the pipeline to be re-examined, yet. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

This adds indexing to the level-triggered controller, and watches `HelmRelease` objects. If a HelmRelease object changes, it's looked up against the index, and any pipeline using it will be queued to be reconciled. This is enough to pass the unit test that checks if a Pipeline becomes ready if it can see all its targets. The indexing is general with respect to the application kind, because it includes the Group and Kind in the index key. But: - the watch is for the HelmRelease type specifically; and, - it does not account for remote clusters. The next aim will be to make scaffolding that watches any application, when it's mentioned in a pipeline. Remote clusters will come later. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

Eventually, this code will need to be able to query for targets in remote clusters. To make room for that, I've done a small rearrangement here such that the client used for a target is obtained from a procedure, rather than assumed to be the client for the local cluster. This change is more to check this structure makes sense, than to alter the behaviour. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

This commit expands the test a little to make sure that when an app (HelmRelease, here) has its status updated, the pipeline's status is updated in sympathy. In making the test pass, this corrects a bug with the implementation so far: when going through the environments in the controller, the environment status was not entered into the map of environment statuses. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

The v1alpha1 CRD has new fields as part of this change. These are intended to be backward-compatible, but since they are nonetheless changed and are included in the chart, the chart version should be incremented. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>

squaremo force-pushed the minimal-leveltriggered-controller branch from 6b0d78d to 48e01d2 Compare October 4, 2023 11:16

lasomethingsomething added the team/wild-watermelon label Oct 4, 2023

squaremo force-pushed the minimal-leveltriggered-controller branch from 48e01d2 to 273d779 Compare October 4, 2023 11:38

squaremo requested review from luizbafilho and yitsushi October 4, 2023 14:29

yitsushi reviewed Oct 5, 2023

View reviewed changes

api/v1alpha1/pipeline_types.go Outdated Show resolved Hide resolved

squaremo force-pushed the minimal-leveltriggered-controller branch from 42812af to f734e82 Compare October 5, 2023 16:54

yitsushi approved these changes Oct 5, 2023

View reviewed changes

luizbafilho approved these changes Oct 5, 2023

View reviewed changes

squaremo added 10 commits October 9, 2023 10:52

squaremo force-pushed the minimal-leveltriggered-controller branch from f734e82 to c2403e0 Compare October 9, 2023 09:54

squaremo enabled auto-merge October 9, 2023 09:58

squaremo merged commit d19807f into main Oct 9, 2023
11 checks passed

squaremo deleted the minimal-leveltriggered-controller branch October 9, 2023 10:00

squaremo mentioned this pull request Oct 16, 2023

Implementing promotion algorithm defined in level-triggered architecture rfc #203

Merged

squaremo mentioned this pull request Oct 24, 2023

Watcher infrastructure for remote clusters and arbitrary target kinds #201

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimal level-triggered controller #200

Minimal level-triggered controller #200

squaremo commented Oct 4, 2023 •

edited

Loading

yitsushi left a comment

squaremo commented Oct 5, 2023

luizbafilho left a comment

Minimal level-triggered controller #200

Minimal level-triggered controller #200

Conversation

squaremo commented Oct 4, 2023 • edited Loading

yitsushi left a comment

Choose a reason for hiding this comment

squaremo commented Oct 5, 2023

luizbafilho left a comment

Choose a reason for hiding this comment

squaremo commented Oct 4, 2023 •

edited

Loading