Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal level-triggered controller #200

Merged
merged 10 commits into from
Oct 9, 2023
Merged

Conversation

squaremo
Copy link
Contributor

@squaremo squaremo commented Oct 4, 2023

This does the preliminary work requested in #194, so that the implementation of level-triggered pipelines can make progress in several (or at least two) directions. After this is merged, we have a controller that checks the status of its targets, so the promotion algorithm can be implemented (#195).

This does just (about) enough to build the promotion algorithm on top, and thereby get to a usable controller. I made simplifications to get here reasonably quickly:

  • only HelmRelease objects are supported
  • only targets in the local cluster (those without a .clusterRef) are supported

Removing those limitations is the work of #196.

I put the new implementation behind a feature flag, so we can work on it without having long-running branches. For many of the subsequent tasks in the umbrella issue #193 it will be enough to work in controllers/leveltriggered/ and write tests there. When we want to work on GUI or other things needing a running controller process, the flag --enable-level-triggered can be given to the binary to start the new controller code.

Fixes #194. With reference to this criterion: "objects that aren't mentioned in a pipeline don't get watched" -- this implementation limits support to a specific type, and the local cluster, and uses the convenience builder API in controller-runtime, so avoids creating informers and watchers and so on dynamically. The expanded implementation (#196) will need to pay careful attention to this, though.

Copy link
Contributor

@yitsushi yitsushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, only the module placement of the new controller.

When/if we drop the original logic, make sure we move the controller under controlers.

Or even better if we move it now and rename it controllers.NewLevelTriggeringPipelineReconciler. It's a common code structure to store controllers under controllers and it bothers me to see the new one under a different module. controllers directory is plural.

api/v1alpha1/pipeline_types.go Outdated Show resolved Hide resolved
@squaremo squaremo force-pushed the minimal-leveltriggered-controller branch from 42812af to f734e82 Compare October 5, 2023 16:54
@squaremo
Copy link
Contributor Author

squaremo commented Oct 5, 2023

When/if we drop the original logic, make sure we move the controller under controlers.

Or even better if we move it now and rename it controllers.NewLevelTriggeringPipelineReconciler. It's a common code structure to store controllers under controllers and it bothers me to see the new one under a different module. controllers directory is plural.

We can move the code when the old one is fully deprecated, yep.

The forked controller went in its own package mainly because the tests rely on a TestMain which sets the controller running, and it is easier and cleaner if the two controller implementations have their own TestMain.

If the implementations being in controllers/ and a subdirectory of controllers/ is the objectionable bit, I could move the old one to e.g., controllers/legacy. It's pretty common to see subdirectories for different controllers that run in the same manager; this is a different situation but the pattern seems about right.

Copy link
Contributor

@luizbafilho luizbafilho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great as a basis for making progress. Is there anything else we can deal with it later.

We want to react to clusters changing, especially if they are going
from an unready state to a ready state. This fixes some small but
egregious problems with how that works:
 - the helper funcs had misleading names
 - the key, though an arbitrary string, was misleading
 - the index lookup did some unnecessary work in compiling a list of
   deep copies of query results

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
As it stood, the controller logic would send an event with
reason="SetStatusConditionError" when the cluster was not found and
setting the status _succeeded_. This commit rearranges this logic so
that an event with reason=GetClusterError is sent if there's any
problem resolving the clusterRef, and reason=SetStatusConditionError
is only sent if the status condition could not be set.

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
This adds checks to the pipeline test to make sure that a pipeline
will be re-examined when a cluster it refers to is 1. created and
2. reaches a ready state. This relies on the indexing working, since
the test timeout is much shorter than the default requeue time (after
which the pipeline would be re-examined anyway).

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
I want the new behaviour to be guarded by a feature flag, and rather
than sprinkling conditionals through the controller code, I'd prefer
to have a completely separate set of code and dispatch to it in
main.go.

A reasonable starting point is the pretty minimal existing controller
code. So: copy the controller code into a new package and adjust as
needed to get

    go test -v ./controllers/leveltriggered

to succeed.

This commit also makes the test main wait for the manager to shut
down. This is minorly superstitious on my part: my experience is that
failure to shut managers down cleanly can cause problems with
subsequent test runs.

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
The level-triggered promotion algorithm will need to know the status
of each target in the pipeline, so it can make a decision on if and
where a promotion needs to be triggered. To be explicit about how it
makes that decision, that controller should record the status it's
seen for each target. This will help with testing, and makes the
Pipeline object self-contained -- you should be able to see from the
status where the pipeline is up to.

This commit introduces a couple more types into the API to represent
the status of a target. Each one gets a full name (the type,
namespace, name, and cluster if it's in another cluster).

When the controller goes through the environments, it fetches the
targets apps. For the sake of not making too many decisions at once,
I've implemented only the case of HelmRelease objects in the local
cluster. Once there are tests demonstrating it works, it can be
generalised to other app types (the API accepts _any_ API group,
version, and kind).

The test involving a GitOpsCluster is skipped, because we would now
expect to fetch the application object from the remote cluster it
represents, rather than just examine the cluster object itself.

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
To date, the controller didn't have much to do other than validate
that it could see the cluster objects mentioned in a pipeline object
(which wasn't itself necessary, since taking actions relied on the
information given in webhooks). Its idea of "Ready" was that it could
see all the clusters, and they were themselves ready. If any cluster
was not present, it could exit early as unready.

Now, however, we want to actually go and fetch the status of the
targets mentioned in the pipeline. "Ready" must have a different
meaning: it means the controller can reach all the targets, and
thereby determine the state of play for the pipeline. It could exit
early in principle, but I have preferred to assess all the targets
even if some are unavailable, and give a summary in the ready
condition at the end.

The additional test will fail for now, since it checks that adding a
previously missing app object will result in a ready pipeline, but
there's nothing to trigger the pipeline to be re-examined, yet.

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
This adds indexing to the level-triggered controller, and watches
`HelmRelease` objects. If a HelmRelease object changes, it's looked up
against the index, and any pipeline using it will be queued to be
reconciled.

This is enough to pass the unit test that checks if a Pipeline becomes
ready if it can see all its targets.

The indexing is general with respect to the application kind, because
it includes the Group and Kind in the index key. But:

 - the watch is for the HelmRelease type specifically; and,
 - it does not account for remote clusters.

The next aim will be to make scaffolding that watches any application,
when it's mentioned in a pipeline. Remote clusters will come later.

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
Eventually, this code will need to be able to query for targets in
remote clusters. To make room for that, I've done a small
rearrangement here such that the client used for a target is obtained
from a procedure, rather than assumed to be the client for the local
cluster. This change is more to check this structure makes sense, than
to alter the behaviour.

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
This commit expands the test a little to make sure that when an app
(HelmRelease, here) has its status updated, the pipeline's status is
updated in sympathy.

In making the test pass, this corrects a bug with the implementation
so far: when going through the environments in the controller, the
environment status was not entered into the map of environment
statuses.

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
The v1alpha1 CRD has new fields as part of this change. These are
intended to be backward-compatible, but since they are nonetheless
changed and are included in the chart, the chart version should be
incremented.

Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
@squaremo squaremo force-pushed the minimal-leveltriggered-controller branch from f734e82 to c2403e0 Compare October 9, 2023 09:54
@squaremo squaremo merged commit d19807f into main Oct 9, 2023
11 checks passed
@squaremo squaremo deleted the minimal-leveltriggered-controller branch October 9, 2023 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prototype the indexing/watching behaviour needed for level-triggered promotions
4 participants