-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-23051: Change default kube-state-metrics behavior to use Cloudzero subchart #91
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! this will be released as a beta version, right? will it be a minor or patch?
I think a beta version is a good idea. Maybe OpenNet would be willing to give it a go for us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also don't want to merge to develop, but a branch off of develop for the feature. This way we don't get this in the main branch until the beta is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - lgtm
… subchart (#91) * override KSM name * enable ksm by default * make cloudzero ksm undiscoverable * improve documentation * option 2 is not the default behavior * fix indentation * add line * add documentation for changing the service port for cloudzero ksm * disable cloudzero KSM as scrape target * set default port * fix endpoint * use default port * add release notes * remove metric exporter documentation * change beta version
* CP-23051: Change default kube-state-metrics behavior to use Cloudzero subchart (#91) * override KSM name * enable ksm by default * CP-23388: Define Static KubeStateMetrics Target Endpoint (#99) * add 1.0.2 release doc file --------- Co-authored-by: bdrennz <146774453+bdrennz@users.noreply.github.com>
* CP-23051: Change default kube-state-metrics behavior to use Cloudzero subchart (#91) * override KSM name * enable ksm by default * CP-23388: Define Static KubeStateMetrics Target Endpoint (#99) * add 1.0.2 release doc file --------- Co-authored-by: bdrennz <146774453+bdrennz@users.noreply.github.com>
* CP-22731: add insights-controller chart (#97) * CP-22731: include cz-insights-controller as subchart * increase replicacount for tag server * CP-22731: add beta testing * update release process for insights controller * update release workflow * make most resources off by default * update readme * use global for secret names * incorporate changes from 0.0.30-beta * add beta release doc * use local chart for testing --------- Co-authored-by: josephbarnett <joe.barnett@cloudzero.com> * CP-22730: use correct pattern list in config * CP-22730: update doc check location to match normal release path (#100) * Update Chart.yaml to version 1.0.0-beta * use latest insights-controller * CP-23435: remove duplicate service account name in insights-controller chart (#103) * CP-23426: use insights-controller service account for init job (#104) * CP-23465: increase default replica count for insights controller (#106) * CP-23423: add release doc for 1.0.1-beta release (#107) * [CP-23425] add default remote write retries (#108) * CP-23425: set default max retries * update init job to work with long running scrapes * increase wait time for scrape endpoint * default batch size added * increase wait time for init job * adjust remote write threshold, add default resource values * Release 1.0.2-beta (#109) * CP-23051: Change default kube-state-metrics behavior to use Cloudzero subchart (#91) * override KSM name * enable ksm by default * CP-23388: Define Static KubeStateMetrics Target Endpoint (#99) * add 1.0.2 release doc file --------- Co-authored-by: bdrennz <146774453+bdrennz@users.noreply.github.com> * move release doc to correct location * Update Chart.yaml to version 1.0.2-beta * CP-22730: package both charts in beta release (#110) * CP-22730: fix artficat name (#111) * CP-22730: fix doc path for github release publish (#112) * CP-23740 (Feature/1.0.3 beta release): Validate KSM Metrics at Install (#116) * remove unused metric * add kubemetrics * bump chart version for beta * use dev tag for validator * fix endpoint var name * allow github to bump version * simplify metric logic * update tag * use dev tag for chart * [CP-23429] merge insights-controller into main chart (#117) * insights-controller added to agent chart * [CP-23428] add helm chart for creating cert (#118) * CP-23428: add certificate helm chart * update with documentation comments * Update charts/cloudzero-agent/README.md Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * Update charts/cloudzero-agent/README.md remove duplicate entry Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * Update charts/cloudzero-agent/README.md add period to end of sentence in readme Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * PR suggestion for readme * update config example --------- Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * CP 24028 add insights controller scape config (#120) CP-24028: add scrape target for insights container CP-22734: Bump insights image release version Enhance README for helm repo management Add release note for next beta version Update release process for customer version numbers in betas * Update Chart.yaml to version 1.0.0-beta-4 * CP 23892 add healthcheck (#121) * CP-23892, CP-24009, CP-23959: release note * add healthcheck support * bump value of insights controller * Update Chart.yaml to version 1.0.0-beta-5 * fix beta deploy script * CP-24118: affinity settings, release notes (#122) * CP-24118: add pod best effort affinity rule for distributing pod instances accross nodes * allow override of KSM in configuration * add next release notes * bump version of controller and validator * fix table in release note * CP-23452 Add recommended installation skills to README (#124) * CP-24008: forward insights controller app metrics (#125) * CP-24389: deprecate unused chart (#126) * CP-20221: Labels and Annotations (#127) * bump final version of insights controller * Adding release notes for 1.0.0 release * Adding cert troubleshooting guide --------- Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * publish material for beta-6 (#128) * update readme, add extra svc names to cloudzero-cert, add cloudzero-cert chart publish (#129) * [CP-24464] default to create self-signed cert upon chart install (#130) * default to create self-signed cert upon chart install * Update charts/cloudzero-agent/docs/releases/1.0.0-beta-7.md Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * Update charts/cloudzero-agent/docs/releases/1.0.0-beta-7.md Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * Update charts/cloudzero-agent/docs/releases/1.0.0-beta-7.md Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * Update charts/cloudzero-agent/README.md Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * Update charts/cloudzero-agent/README.md Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * Update charts/cloudzero-agent/README.md Co-authored-by: JB <josephbarnett@users.noreply.github.com> * Update charts/cloudzero-agent/README.md Co-authored-by: JB <josephbarnett@users.noreply.github.com> --------- Co-authored-by: Becki Lee <becki.lee@cloudzero.com> Co-authored-by: JB <josephbarnett@users.noreply.github.com> * enable new metric for insights controller failures (#132) * CP-24424: change init scrape job to use new -backfill option (#131) Previously, the scrape job would use curl to hit a /scrape HTTP endpoint on the webhook server. This was problematic on larger clusters where the operation takes a long time since the HTTP context was getting cancelled before the operation completed. This patch switches to using a new -backfill option on the controller binary, which causes the binary to run the backfiller (née scraper) and exit instead of acting as an HTTPd. * remove certificate chart from beta workflow (#133) * Update Chart.yaml to version 1.0.0-beta-7 * add back missing packaging (#134) * add upgrade command to beta-7 release notes (#135) * CP-24743: allow all resources to use imagePullSecrets (#136) * CP-24743: add imagePullSecrets to cert job * Update Chart.yaml to version 1.0.0-beta-8 * CP-24792: allow more configurable settings, increase default remote write timeout (#137) * CP-24792: allow more configurable settings, increase default remote write timeout * CP-24792: add KSM image info for easy identification of images to mirror for private image registries (#139) * CP-24792: add KSM image info for easy identification of images to mirror to private repos * add template command for finding images --------- Co-authored-by: Becki Lee <becki.lee@cloudzero.com> * CP-24833: template KSM service address using the release name (#140) * Update Chart.yaml to version 1.0.0-beta-9 * CP-24886: ensure KSM service and KSM target always match (#143) * CP-24886: ensure ksm svc and target match * Update NOTES.txt --------- Co-authored-by: Thomas Evans <teevans@users.noreply.github.com> * Update Chart.yaml to version 1.0.0-beta-10 * Add server.agentMode boolean configuration option This just provides a convenient way to toggle agent mode on/off for debugging, which is valuable since agent mode disables a *lot* of Prometheus functionality which can be very useful for debugging, such as the /graph endpoint. * Add metric_relabel_configs to insights controller scrape job. This should just restrict the metrics to those we're interested in, as defined in values.yaml. * CP-23129: add Prometheus scrape job to scrape metrics from itself I also switched from a hardcoded value it to using `prometheusConfig.scrapeJobs.kubeStateMetrics.scrapeInterval` for the KSM job scrape_interval. This seems to pretty clearly be the intent of the configuration option, but it was not being used. Notably, this increases the interval from 1m to 2m. * [CP-24912] use image tag and chart name in init job name (#144) * always use insightsController image reference in init scrape job name * CP-24655: use backfill instead of scrape for init job that gathers existing state (#145) * CP-25115: add release notes for 1.0.0-rc1 release (#147) * CP-24655: add release nodes for RC1 * fix main chart release in rel branch (#151) * CP-25165: allow user to choose release branch in main chart release (#152) * CP-25165: checkout given branch (#153) * CP-25165: checkout given branch in correct order (#154) * CP-25165: checkout the input branch, not main (#155) * Basic install success message. (#149) * Update charts/cloudzero-agent/Chart.yaml Co-authored-by: JB <josephbarnett@users.noreply.github.com> * CP-25270: prepare release/1.0.0 for merging (#158) * update docs, remove cert-manager references from test, add missing quote --------- Co-authored-by: josephbarnett <joe.barnett@cloudzero.com> Co-authored-by: Automated CZ Release <ops@cloudzero.com> Co-authored-by: bdrennz <146774453+bdrennz@users.noreply.github.com> Co-authored-by: Becki Lee <becki.lee@cloudzero.com> Co-authored-by: JB <josephbarnett@users.noreply.github.com> Co-authored-by: evan-cz <evan.nemerson@cloudzero.com> Co-authored-by: Thomas Evans <teevans@users.noreply.github.com>
Description
This PR does the following:
Testing
To test this, I installed vanilla Prometheus alongside the Agent in a cluster and checked the Prometheus UI to ensure the Target didn't appear. This was in fact the case.
Checklist
main