From 8ef472d0ebaf5a3ef24b39643e9c4f5ddacd030f Mon Sep 17 00:00:00 2001 From: Pawel Palucki Date: Fri, 26 Apr 2024 11:40:11 +0200 Subject: [PATCH] fix old wrong defaults in README --- deployment/pcm/README.md | 33 ++++++++++++++------------------- 1 file changed, 14 insertions(+), 19 deletions(-) diff --git a/deployment/pcm/README.md b/deployment/pcm/README.md index fb2aa9c8..59aa4a46 100644 --- a/deployment/pcm/README.md +++ b/deployment/pcm/README.md @@ -5,27 +5,22 @@ Helm chart instructions ### Features: - privilege and non-privileged container (value: `privileged`), -- node-feature-discovery based nodeSelector and nodeAffinity (values: nfd, nfdBaremetalAffinity, nfdRDTAffinity) - bare-metal and VM host configurations (files: values-metal.yaml, values-vm.yaml), - Ability to deploy multiple releases alongside configured differently to handle different kinds of machines (bare-metal, VM) at the same time, -- Examples for non-privileged mode using device plugin ("smarter-devices-manager") or using NRI device-injector plugin (TODO) (file: values-smarter-devices-cpu-mem.yaml), -- Deploy Prometheus operator' PodMonitor (value: `podMonitor`) -- Integration with NRI balloons policy plugin (value: `nriBalloonsPolicyIntegration`), - Controllable set of metrics and method of collection (RDT, uncore), support direct (msr) and indirect (Linux abstractions perf/resctrl) counter accesses (file: values-indirect.yaml) - Linux Watchdog handling (controlled with PCM_KEEP_NMI_WATCHDOG, PCM_NO_AWS_WORKAROUND, nmiWatchdogMount values) - Deploy to own namespace with "helm install ... **-n pcm --create-namespace**" -- Local image registry for development (file: values-local-image.yaml), -TODO/Ideas: -- [ ] Refactor extra features: node-feature-discovery, NRI interegration only as extra values for generic fields (annotations, nodeSelector/nodeAffinity) -- [ ] Check if energy metrics can be accessible through perf subsystem -- [ ] GitHub actions for linter/security scanners, -- [ ] Idea: Change metrics names (follow Prometheus best practices) -- [ ] Idea: init container to check permission for all required components (devices/CPU) -- [ ] Implement Helm chart test pods + NOTES -- [ ] Test liveness/readiness probes -- [ ] Testing in Cluster Manager Systems like (e.g. Ranger,Gardener) different node types VM(1socket,all sockets), bare-metal -- [ ] Test in different cloud GCP/Azure/AWS +#### Integration features: + +- node-feature-discovery based nodeSelector and nodeAffinity (values: nfd, nfdBaremetalAffinity, nfdRDTAffinity), +- Examples for non-privileged mode using device plugin ("smarter-devices-manager") or using NRI device-injector plugin (TODO) (file: values-smarter-devices-cpu-mem.yaml), +- Integration with NRI balloons policy plugin (value: `nriBalloonsPolicyIntegration`), + +#### Debugging features: + +- Local image registry for development (file: values-local-image.yaml), +- Deploy Prometheus operator' PodMonitor (value: `podMonitor`) ### Getting started @@ -63,11 +58,11 @@ helm upgrade --install pcm . --set privileged=false --set nfd=true --set podMoni ### Requirements -- Full set of metrics requires metal instance (uncore metrics, RDT, energy, UPI), +- Full set of metrics requires bare-metal or .metal instance (uncore metrics, RDT, energy, UPI), - Core metrics (instructions, cycles are also available) on VM instances, -- In both case "msr" kernel module has to be loaded in host OS, -- pod is allowed to be run with privileged capabilities (SYS_ADMIN, SYS_RAWIO) on given namespace, -- Pod Security Standards allow to run on privileged level, +- /sys/fs/resctrl has to be mounted on host OS, +- pod is allowed to be run with privileged capabilities (SYS_ADMIN, SYS_RAWIO) on given namespace in other words: Pod Security Standards allow to run on privileged level, + ``` pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: latest