-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watcher infrastructure for remote clusters and arbitrary target kinds #201
Conversation
5a5c11a
to
a4b3af3
Compare
4a08cad
to
8a4f108
Compare
|
||
// having done all that, did we really need it? | ||
c.cachesMu.Lock() | ||
if cacheEntry, cacheFound = c.cachesMap[cacheKey]; !cacheFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it your question about getting the cache entry again? It's not clear why you are doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm trying to avoid holding the lock while doing I/O. So, it checks whether the new cache is needed, and if so, releases the lock before going off to find cluster secrets with which to construct a client. The lock is reacquired in this line, and since this might be interleaved with some other process constructing caches, the presence of a cache is checked again before the map is written to.
Alternatives are
- hold the lock for an indefinite period while tracking down a client config;
- using a RWLock and "upgrading" from a read lock to a write lock if necessary
Of these, 2. is pretty close to what's implemented, and might be a useful refinement (it would mean there's no exclusive lock for the "hot" path, where there's already a cache). 1.) is dubious because it would mean locking the whole map exclusively while doing network I/O, during which no other work can make progress. I think anything other than those two has more complication for no particular benefit, but I can be persuaded :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think a clearer comment is needed @luizbafilho ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fine. those calls can potentially take a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great, there is a lot of new stuff, I've never used, but it was good to learn new tricks.
No asks, just minor questions
e3641cb
to
4af195c
Compare
This is not quite the most recent version of controller-runtime, but it's recent enough, and it's the version compatible with sig.k8s.io/cluster-api, which I would like to introduce. In newer releases, there are some problems with kubeyaml's openapi package that I don't want to deal with right now. The changes to patch up are: - the builder package uses `Watch(client.Object, ...)` rather than `Watch(source.Source, ...)` - `handler.MapFunc` now takes a `context.Context` as the first arguments, so you don't have to use `context.Background` There's also some change to how Kubernetes events get stringified, which broke a test (in notification_test.go). Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
The tests make their own `PipelineReconciler` struct, but there's a constructor `NewPipelineReconciler` which could do some initialisation, and that will missed. So: use the constructor in suite-test.go. However! This now fails to initialise the eventRecorder, leaving it to default in `SetupWithManager`, and capturing events is part of the tests. I don't want to disturb existing behaviour*, so I've kept this change to the level-triggered controller. *How does it change? The event recorder is constructed using a notification-controller URL and used _only_ for promotion notifications. The event recorder for things that happen in the reconciler is left to default, which means it won't send those to the notification-controller. In practice it's unlikely to matter, since any alerts set up should filter for the promotion events. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
If you use the Gomega or *testing.T from the outer scope, passes and fails are not reported against individual tests correctly. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
This commit adds the infrastructure for watching and querying arbitrarily-typed pipeline targets, in remote clusters as well as the local cluster. The basic shape is this: for each target that needs to be examined, the reconciler uses `watchTargetAndGetReader(..., target)`. This procedure encapsulates the detail of making sure there's a cache for the target's cluster and type, and supplies the client.Reader needed for fetching the target object. A `cache.Cache` is kept for each {cluster, type}. `cache.Cache` is the smallest piece of machinery that can be torn down, because the next layer down, `Informer` objects, can't be removed once created. This is important for being able to stop watching targets when they are no longer targets. Target object updates will come from all the caches, which come and (in principle) go; but, the handler must be statically installed in SetupWithManager(). So, targets are looked up in an index to get the corresponding pipeline (if there is one), and that pipeline is put into a `source.Channel`. The channel source multiplexes the dynamic event handlers into a static pipeline requeue handler. NB: * I've put the remote cluster test in its own Test* wrapper, because it needs to start another testenv to be the remote cluster. * Supporting arbitrary types means using `unstructured.Unstructured` when querying for target objects, and this complicates checking their status. Since the caches are per-type, in theory there could be code for uerying known types (HelmRelease and Kustomize), with `Unstructured` as a fallback. So long at the object passed to `watchTargetAndGetReader(...) is the same one used with client.Get(...), it should all work. * A cache per {cluster, type} is not the only possible scheme. The watching could be more precise -- meaning fewer spurious events, and narrower permissions needed -- by having a cache per {cluster, namespace, type}, with the trade-off being managing more goroutines, and other overheads. I've chosen the chunkier scheme based on an informed guess that it'll be more efficient for low numbers of clusters and targets.
This makes room for indexing targets for a different purpose -- garbage collection of caches. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
It's possible that due to pipelines disappearing, or being updated, some caches will no longer be needed. If these are not shut down, the number of caches will only grow, which constitutes a leak of resources (though not necessarily a serious one, since it will max out at `clusters x types`). To be able to shut down caches that are no longer needed, we need to be able to do a few things: 1. detect when they aren't needed 2. stop them running when not needed 3. stop them when the controller is shutting down To do the first, I index the cache keys used by each pipeline. The garbage collector regularly checks to see if each cache has entries in the index; and if not, it's not used by any pipeline and can be shut down. To keep track of caches to consider for collection, the GC uses a rate-limiting work queue. When the cache is created, it's put on the queue; and each time it's considered and is still needed, it's requeued with a longer retry, up to about eight minutes. This avoids the question of finding an appropriate event to hook into, with the downside of being a bit eventual. The second and third things can be arranged by deriving contexts from the manager's context. I have introduced `runner` (in runner.go) which can be Start()ed by the manager and thus gain access to its context, and which can then construct a context for each cache. Each cache gets its own cancel func that can be used to shut it down, but will also be shut down by the manager when it's shutting down itself. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
All the bits to do with client caches can go together, since (after being constructed) it only interacts with the reconciler through `watchTargetAndGetReader` and via events. This is mainly a case of relocating the relevant fields and changing some variable names. Signed-off-by: Michael Bridgen <michael.bridgen@weave.works>
4af195c
to
faa331d
Compare
Thanks for reviewing, Luiz ⭐ |
Description
This PR fills in the gaps deliberately left in #200. That is, it adds support for targets in remote clusters, and targets with arbitrary APIVersion and Kind. After this change, the controller maintains a set of Kubernetes API client caches -- one for each
{cluster, type}
(where type is the group, version, and kind of the target) -- which serve both for querying target status, and being informed of updates to targets.Most of the substance of this change is behind
watchTargetAndGetReader
, which is the entry point to using the caches. Its contract is that it will both 1. supply a client which can read the given target; and 2. make sure the target has a watch looking at it, so that when the target changes, the pipeline using it can be re-examined.This involves a few new pieces of machinery:
Individual commit messages (especially e738f65 and f980554), and comments in the code, explain how these work in detail.
The changes to the reconciliation code are small -- basically,
caches.watchTargetAndGetReader
to obtain a client for the right cluster, when examining a target (this replacesgetClusterClient
, which was introduced as a stub in Minimal level-triggered controller #200 to make room for this change)unstructured.Unstructured
(dynamic client) values rather than API package typed values to represent target objects, so we can target arbitrary types without needing their API packages at compile timeunstructured.Unstructured
values, which is a bit more laborious but supports any type with the right fieldsThere are a few incidental changes that are consequence of updating modules so that I can import Cluster API; and, extra files in config/testdata/crds so as to be able to test Kustomization objects and arbitrary types as targets.
Verifying it works
I added tests that demonstrate that
Once this and the promotion algorithm (#203) have been merged, or perhaps when one is rebased on the other, we will be ready to actually try it.
Effect of this PR
Closes #196.
Checklist (from #196)
Prototype the indexing/watching behaviour needed for level-triggered promotions #194, to cover targets in remote clusters (https://github.com/weaveworks/pipeline-controller/pull/201/files#diff-b4246e2660e154bca730e39f47e67d55b6de0808e1f42749c66706a352337382)