Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure health status for custom resources implementing kstatus #4192

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

erikgb
Copy link
Contributor

@erikgb erikgb commented Feb 6, 2024

What changed?

This PR will compute the kstatus for resource types without existing custom health checks code. This should improve the visualization of custom resources implementing kstatus.

Why was this change made?

We are using many controllers for custom resources in our clusters. For all the operators we are maintaining, we have implemented kstatus, and this seems to be an emerging standard adopted by more and more open-source projects. The changes proposed in this PR will improve the user feedback for kstatus-compatible resources in weave-gitops.

In-tree types shows up quite nicely:

image

And I am hoping the kstatus compute function as a fallback for health will improve the visualization for many custom resources:

image

How was this change implemented?

I attempt to replace all/some of the custom code for health indicator with kstatus compute, but that broke many tests - which could represent a breaking change for some users. I think we could consider switching out some of the custom code (e.g. for Deployment), but I'll leave that for an eventual follow-up PR.

How did you validate the change?

Added new test cases to existing test.

Release notes

N/A: I don't think so.

Documentation Changes

@kingdonb
Copy link

kingdonb commented Feb 9, 2024

Thanks @erikgb for submitting this PR

I see some test failing, I am not a WG release engineer so I'm not sure if that means anything.

It is uncertain when there will be another release, but for anyone that comes by who wants to test this, can you tell us what steps to build the image you took, and how to test it using your build / our own build?

Maybe this is all documented, but I want to make it clear, I don't think there is any maintainer here to merge this and release it. I'm sure you saw the announcement about Weaveworks is closing down. The weave.works website is gone already, it may return in some form, but docs.gitops.weave.works is hopefully on a free hosting with GitHub pages and may persist forever?

I don't know if you have any interest in forking this and changing the name, or does your company offer Flux support - we could mention it on the ecosystem page if you're going to maintain it in the future!

@kingdonb
Copy link

kingdonb commented Feb 9, 2024

If we can merge and release this without a lot of ceremony, I'll be happy to throw the levers. As long as my access holds out, but I'm not sure how we can guarantee that, (or if rebranding will be needed first.)

@kingdonb
Copy link

kingdonb commented Feb 9, 2024

To be clearer about what this does fix - does the HelmRepository of type: oci reflect as ready with this change? That's the main issue people have been reporting since Flux GA.

@erikgb
Copy link
Contributor Author

erikgb commented Feb 9, 2024

@kingdonb Thanks for the review. First to the failing build. I think it fails because some dependencies are outdated. This is the error message from the failing log:

yarn audit --production
yarn audit v1.22.21
┌───────────────┬──────────────────────────────────────────────────────────────┐
│ moderate      │ Follow Redirects improperly handles URLs in the url.parse()  │
│               │ function                                                     │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Package       │ follow-redirects                                             │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Patched in    │ >=1.15.4                                                     │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Dependency of │ http-proxy-middleware                                        │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ Path          │ http-proxy-middleware > http-proxy > follow-redirects        │
├───────────────┼──────────────────────────────────────────────────────────────┤
│ More info     │ https://www.npmjs.com/advisories/10[9](https://github.com/weaveworks/weave-gitops/actions/runs/7807149219/job/21294940962#step:7:10)6353                     │
└───────────────┴──────────────────────────────────────────────────────────────┘
1 vulnerabilities found - Packages audited: [10](https://github.com/weaveworks/weave-gitops/actions/runs/7807149219/job/21294940962#step:7:11)[15](https://github.com/weaveworks/weave-gitops/actions/runs/7807149219/job/21294940962#step:7:16)
Severity: 1 Moderate
Done in 0.63s.
make: *** [Makefile:[19](https://github.com/weaveworks/weave-gitops/actions/runs/7807149219/job/21294940962#step:7:20)1: ui-audit] Error 4
Error: Process completed with exit code 2.

This could be fixed by #4186. But CI on that PR is also failing. So in general I would recommend renovating the project before we merge any functional PRs like this one. But that might be challenging - since maintainers are not available...

@erikgb
Copy link
Contributor Author

erikgb commented Feb 9, 2024

To be clearer about what this does fix - does the HelmRepository of type: oci reflect as ready with this change? That's the main issue people have been reporting since Flux GA.

I think this PR might fix this issue. The change is pretty simple: It just tries to compute the resource health status assuming resources implement kstatus. This is done after the hand-coded health checks already implemented, for selective resource types, are run. So if Flux resources follow the kstatus specification, which I think they do, this PR will represent an improvement in this area.

@kingdonb
Copy link

kingdonb commented Feb 10, 2024

I just looked into #4186, I (may?) have write access to this project but I am not an authorized user to merge PRs to main.

So we may need to fork this, and republish it somewhere else. I don't know if I have the energy to do all that.

@erikgb
Copy link
Contributor Author

erikgb commented Feb 11, 2024

I just looked into #4186, I (may?) have write access to this project but I am not an authorized user to merge PRs to main.

So we may need to fork this, and republish it somewhere else. I don't know if I have the energy to do all that.

Thanks for looking deeper into this, @kingdonb! I don't think we have the resources or skills to fork and continue this project somewhere else. Are you aware of some other good GUI for FluxCD? It's painful to say this, but if there is no GUI to visualize the status of Flux resources in our clusters, we'll probably have to consider migrating to ArgoCD.

@kingdonb
Copy link

kingdonb commented Feb 11, 2024

I am still picking up the pieces after the closing of Weaveworks in my home lab, I depended heavily on Weave GitOps Enterprise and now I am not licensed to use it anymore.

I'll let you know what I come up with in a few weeks. I'm not focused on the GUI, I am focused on keeping the Flux team together and answering this question broadly though.

I heard Gimlet is in active development, and also there's Flamingo. The UI of Flux is the Kubernetes status condition. So any UI which is built to manage CRDs broadly should work with Flux, as long as it handles kstatus well and can elevate negative polarity status conditions to the UI then I think it should be good.

But again, I've been avoiding those UIs and focusing on our own as a user for DX purposes, I am the wrong person to ask about other UIs for Flux. My job depended on using this UI.

Edit: there's also the VSCode GitOps Tools, which I think will continue its development independently. We secured the assets to produce new releases of that one, and we can show we still continued to support it after Weaveworks, in limited maintenance for now while we figure things out. Flamingo or Flux Subsystem for Argo can also say the same, from what I can tell.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants