Skip to content

Commit 7f2c707

Browse files
committed
address jcfunk comments: interval and extra labels for PodMonitor + refactor readme
1 parent f2747c7 commit 7f2c707

File tree

4 files changed

+91
-67
lines changed

4 files changed

+91
-67
lines changed

deployment/pcm/README.md

Lines changed: 11 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,9 @@ helm install prometheus prometheus-community/kube-prometheus-stack --set prometh
148148
kubectl get sts prometheus-prometheus-kube-prometheus-prometheus
149149
```
150150

151+
Note: `podMonitorSelectorNilUsesHelmValues` is disabled (set to false) so Prometheus operator will be able to handle PCM podMonitor deployed without extra `podMonitorLabels` or otherwise pcm need to be deployed like this:
152+
`helm install pcm . --set podMonitor=true --set podMonitorLabels.release=prometheus` (assuming Prometheus operator was deployed as "prometheus")
153+
151154
#### 5) Deploy PCM helm chart
152155

153156
```
@@ -217,72 +220,14 @@ helm install pcm-metal . -f values-metal.yaml
217220

218221
#### Direct method as non-privileged container (not recommended)
219222

220-
**TODO**: TO BE MOVED TO EXTERNAL FILE/SECTION
221-
222-
**Note** PCM requires access to /dev/cpu device in read writer mode (MSR access) but it is no possible currently to mount devices in Kubernetes pods/containers in vanila Kubernetes. Please read this isses for more information https://github.com/kubernetes/kubernetes/issues/5607.
223-
224-
##### a) Device injection using 3rd party device-plugin
225-
226-
227-
TO run PCM with as non privileged pod, we can third party devices plugins e.g.:
228-
229-
- https://github.com/smarter-project/smarter-device-manager
230-
- https://github.com/squat/generic-device-plugin
231-
- https://github.com/everpeace/k8s-host-device-plugin
232-
233-
**Warning** This plugins were NOT audited for security concerns, **use it at your own risk**.
234-
235-
Below is example how to pass /dev/cpu and /dev/mem using smarter-device-manager in kind based Kubernetes test cluster.
236-
237-
```
238-
# Label node to deploy device plugin on that node
239-
kubectl label node kind-control-plane smarter-device-manager=enabled
240-
241-
# Install "smarter-device-manager" device plugin with only /dev/cpu and /dev/mem devices enabled:
242-
git clone https://github.com/smarter-project/smarter-device-manager
243-
helm install smarter-device-plugin --create-namespace --namespace smarter-device-plugin smarter-device-manager/charts/smarter-device-manager --set 'config[0].devicematch=^cpu$' --set 'config[0].nummaxdevices=1' --set 'config[1].devicematch=^mem$' --set 'config[1].nummaxdevices=1'
244-
245-
# Check that cpu and mem devices are available - should return "1"
246-
kubectl get node kind-control-plane -o json | jq .status.capacity
223+
**Note** PCM requires access to /dev/cpu device in read-write mode (MSR access) but it is no possible currently to mount devices in Kubernetes pods/containers in vanilla Kubernetes for unprivileged containers. Please find more about this limitation https://github.com/kubernetes/kubernetes/issues/5607.
247224

248-
# Install pcm helm chart in unprivileged mode with extraResources for cpu and memory devices.
249-
helm install pcm . --set privileged=false -f values-direct.yaml -f values-smarter-devices-cpu-mem.yaml
250-
```
225+
To expose necessary devices to pcm-sensor-server, one can use:
251226

252-
##### b) Device injection using NRI plugin device-injection
227+
a) Kubernetes device plugin (using Kubernetes [CDI](https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/) interface),
228+
b) containerd plugin (using [NRI](https://github.com/containerd/nri/) interface),
253229

254-
**TODO**: **Warning** This is work in progress, because it is needed to manually specific all /dev/cpu/XX/msr devices, which is unpractical in production (TO BE MOVED TO EXTERNAL FILE).
255-
256-
```
257-
git clone https://github.com/containerd/nri/
258-
(cd nri/plugins/device-injector/ && go build )
259-
docker cp kind-control-plane:/etc/containerd/config.toml config.toml
260-
261-
cat >>config.toml <<EOF
262-
[plugins."io.containerd.nri.v1.nri"]
263-
# Disable NRI support in containerd.
264-
disable = false
265-
# Allow connections from externally launched NRI plugins.
266-
disable_connections = false
267-
# plugin_config_path is the directory to search for plugin-specific configuration.
268-
plugin_config_path = "/etc/nri/conf.d"
269-
# plugin_path is the directory to search for plugins to launch on startup.
270-
plugin_path = "/opt/nri/plugins"
271-
# plugin_registration_timeout is the timeout for a plugin to register after connection.
272-
plugin_registration_timeout = "5s"
273-
# plugin_requst_timeout is the timeout for a plugin to handle an event/request.
274-
plugin_request_timeout = "2s"
275-
# socket_path is the path of the NRI socket to create for plugins to connect to.
276-
socket_path = "/var/run/nri/nri.sock"
277-
EOF
278-
279-
docker cp config.toml kind-control-plane:/etc/containerd/config.toml
280-
docker exec kind-control-plane systemctl restart containerd
281-
docker exec kind-control-plane systemd-run -u device-injector /device-injector -idx 10 -verbose
282-
docker exec kind-control-plane systemctl status device-injector
283-
284-
helm install pcm-device-injector . --set privileged=false --set hostPort= --set debugSleep=true -f values-opcm-local-image.yaml -f values-device-injector.yaml
285-
```
230+
Examples can be find [here](docs/direct-unprivileged-deployment.md).
286231

287232
#### Development (with local images) and testing
288233

@@ -313,17 +258,17 @@ helm upgrade --install pcm . --set debugPcm=true
313258
helm upgrade --install pcm . --set debugSleep=true
314259
```
315260

316-
**TODO:** consiert debug options to be removed before release for security reasons
261+
**TODO:** consider debug options to be removed before release for security reasons
317262

318-
5) Check logs or intercat with container directly:
263+
5) Check logs or interact with container directly:
319264
```
320265
# exec into pcm container
321266
kubectl exec -ti ds/pcm -- bash
322267
# or check logs
323268
kubectl logs ds/pcm
324269
```
325270

326-
#### Metric collection methods (capabilites vs requirements)
271+
#### Metric collection methods (capabilities vs requirements)
327272

328273
| Method | Used interfaces | default | Notes |
329274
|---------------|------------------------------------------------------------| -------- | ------------------------------------------------------------------------------------- |
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
--------------------------------------------------------------------------------
2+
Examples of deploying with direct MSR access as non-privileged container
3+
--------------------------------------------------------------------------------
4+
5+
#### Direct method as non-privileged container (not recommended)
6+
7+
##### a) Device injection using 3rd party device-plugin
8+
9+
TO run PCM with as non privileged pod, we can third party devices plugins e.g.:
10+
11+
- https://github.com/smarter-project/smarter-device-manager
12+
- https://github.com/squat/generic-device-plugin
13+
- https://github.com/everpeace/k8s-host-device-plugin
14+
15+
**Warning** This plugins were NOT audited for security concerns, **use it at your own risk**.
16+
17+
Below is example how to pass /dev/cpu and /dev/mem using smarter-device-manager in kind based Kubernetes test cluster.
18+
19+
```
20+
# Label node to deploy device plugin on that node
21+
kubectl label node kind-control-plane smarter-device-manager=enabled
22+
23+
# Install "smarter-device-manager" device plugin with only /dev/cpu and /dev/mem devices enabled:
24+
git clone https://github.com/smarter-project/smarter-device-manager
25+
helm install smarter-device-plugin --create-namespace --namespace smarter-device-plugin smarter-device-manager/charts/smarter-device-manager --set 'config[0].devicematch=^cpu$' --set 'config[0].nummaxdevices=1' --set 'config[1].devicematch=^mem$' --set 'config[1].nummaxdevices=1'
26+
27+
# Check that cpu and mem devices are available - should return "1"
28+
kubectl get node kind-control-plane -o json | jq .status.capacity
29+
30+
# Install pcm helm chart in unprivileged mode with extraResources for cpu and memory devices.
31+
helm install pcm . --set privileged=false -f values-direct.yaml -f values-smarter-devices-cpu-mem.yaml
32+
```
33+
34+
##### b) Device injection using NRI plugin device-injection
35+
36+
**TODO**: **Warning** This is work in progress, because it is needed to manually specific all /dev/cpu/XX/msr devices, which is unpractical in production (TO BE MOVED TO EXTERNAL FILE).
37+
38+
```
39+
git clone https://github.com/containerd/nri/
40+
(cd nri/plugins/device-injector/ && go build )
41+
docker cp kind-control-plane:/etc/containerd/config.toml config.toml
42+
43+
cat >>config.toml <<EOF
44+
[plugins."io.containerd.nri.v1.nri"]
45+
# Disable NRI support in containerd.
46+
disable = false
47+
# Allow connections from externally launched NRI plugins.
48+
disable_connections = false
49+
# plugin_config_path is the directory to search for plugin-specific configuration.
50+
plugin_config_path = "/etc/nri/conf.d"
51+
# plugin_path is the directory to search for plugins to launch on startup.
52+
plugin_path = "/opt/nri/plugins"
53+
# plugin_registration_timeout is the timeout for a plugin to register after connection.
54+
plugin_registration_timeout = "5s"
55+
# plugin_requst_timeout is the timeout for a plugin to handle an event/request.
56+
plugin_request_timeout = "2s"
57+
# socket_path is the path of the NRI socket to create for plugins to connect to.
58+
socket_path = "/var/run/nri/nri.sock"
59+
EOF
60+
61+
docker cp config.toml kind-control-plane:/etc/containerd/config.toml
62+
docker exec kind-control-plane systemctl restart containerd
63+
docker exec kind-control-plane systemd-run -u device-injector /device-injector -idx 10 -verbose
64+
docker exec kind-control-plane systemctl status device-injector
65+
66+
helm install pcm-device-injector . --set privileged=false --set hostPort= --set debugSleep=true -f values-opcm-local-image.yaml -f values-device-injector.yaml
67+
```

deployment/pcm/templates/podmonitor.yaml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ metadata:
88
{{- include "pcm.labels" . | nindent 4 }}
99
app.kubernetes.io/component: metrics
1010
jobLabel: pcm
11+
{{- with .Values.podMonitorLabels }}
12+
{{- toYaml . | nindent 4 }}
13+
{{- end }}
1114
spec:
1215
attachMetadata:
1316
node: true
@@ -24,7 +27,7 @@ spec:
2427
honorTimestamps: true
2528
path: /metrics
2629
port: pcm-metrics
27-
interval: 1s
30+
interval: {{ .Values.podMonitorInterval | quote }}
2831
relabelings:
2932
- sourceLabels:
3033
- __meta_kubernetes_pod_node_name

deployment/pcm/values.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,15 @@ extraResources: {}
8484
hostPort: 9738
8585
# Deploy PromtheusOperator PodMonitor (requires hostPort to be not empty)
8686
podMonitor: false
87+
# Extra PodMonitor labels to let Prometheus operator filter based on that
88+
# e.g. default "kube-prometheus-stack" helm chart requires additional release:"{name of chart release}" label in podMonitor to be considered
89+
# here is example how to check extra labels required to be added to PodMonitor
90+
# 1) kubectl get prometheus -o jsonpath='{.items[].spec.podMonitorSelector.matchLabels}' # e.g. release: prometheus
91+
# 2) helm install pcm . --set podMonitor=true --set podMonitorLabels.release=prometheus
92+
podMonitorLabels: {}
93+
# Default interval for Prometheus scrapping configuration
94+
podMonitorInterval: 30s
95+
8796

8897
### -------------- NRI balloons policy plugin -------------
8998
# PCM deployment to be intergrated with NRI balloons resource policy intergration

0 commit comments

Comments
 (0)