Skip to content

Commit

Permalink
update README an values comments
Browse files Browse the repository at this point in the history
  • Loading branch information
ppalucki committed Apr 26, 2024
1 parent 50a195d commit 263390c
Show file tree
Hide file tree
Showing 4 changed files with 92 additions and 93 deletions.
115 changes: 55 additions & 60 deletions deployment/pcm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,8 @@ kubectl logs ds/pcm

### Requirements

- Full set of metrics requires bare-metal or .metal instance (uncore metrics, RDT, energy, UPI),
- Core metrics (instructions, cycles are also available) on VM instances,
- /sys/fs/resctrl has to be mounted on host OS,
- Full set of metrics (uncore/UPI, RDT, energy) requires bare-metal or .metal cloud instance.
- /sys/fs/resctrl has to be mounted on host OS (for default indirect deployment method),
- pod is allowed to be run with privileged capabilities (SYS_ADMIN, SYS_RAWIO) on given namespace in other words: Pod Security Standards allow to run on privileged level,

```
Expand All @@ -77,74 +76,40 @@ More information here: https://kubernetes.io/docs/tutorials/security/ns-level-ps

### Defaults

- Use Linux abstraction to access event counters (Linux Perf, resctrl) and run container in un-privileged mode.
- hostPort 9738 is exposed on host, (TODO: security review)
- Prometheus podMonitor is disabled

#### Metric availability and requirements (devices/mounts/permissions)

| Method | Used interfaces | default | Notes |
|---------------|------------------------------------------------------------| -------- | ------------------------------------------------------------------------------------- |
| indirect | perf, resctrl | v | missing energy metrics, |
| direct | msr | | requires msr module and access to /dev/cpu (non trivial) or privileged access |


| Metrics | Available on Hardware | Available through interface | Available through method |
| --------------------- | ----------------------------- | ---------------------------- | ------------------------ |
| core | bare-metal, VM (any) | msr or perf | any |
| uncore (UPI) | bare-metal, VM (all sockets) | msr or perf | any |
| RDT (MBW,L3OCCUP) | bare-metal, VM (all sockets) | msr or resctrl | any |
| energy, temp | bare-metal (only) | msr | direct |
| perf-topdown | | perf only | indirect |


| Interface | Requirements | Controlled by (env/helm value) | default helm | Used by source code | Notes |
|---------------|------------------------------------------------------------|---------------------------------|-----------------------|----------------------------------------------------------|-----------------------------------------------------|
| perf | sys_perf_open() perf_paranoid<=0/privileged/CAP_ADMIN | PCM_NO_PERF | use perf | programPerfEvent(), PerfVirtualControlRegister() | |
| perf-uncore | sys_perf_open() perf_paranoid<=0/privileged/CAP_ADMIN | PCM_USE_UNCORE_PERF | use perf for uncore | programPerfEvent(), PerfVirtualControlRegister() | |
| perf-topdown | /sys/bus/event_source/devices/cpu/events | sysMount | yes | cpucounters.cpp:perfSupportsTopDown() | TODO: conflicts with sys/fs/resctrl |
| RDT | uses "msr" or "resctrl" interface | PCM_NO_RDT | yes | cpucounters.cpp:isRDTDisabled()/QOSMetricAvailable() | |
| resctrl | RW: /sys/fs/resctrl | PCM_USE_RESCTRL | yes | resctrl.cpp | resctrlHostMount |
| watchdog | RO/RW: /proc/sys/kernel/nmi_watchdog | PCM_KEEP_NMI_WATCHDOG | yes (tries to disable)| src/cpucounters.cpp:disableNMIWatchdog() | |
| msr | RW: /dev/cpu/X/msr + privileged or CAP_ADMIN/CAP_RAWIO | PCM_NO_MSR | msr is disabled | msr.cpp:MsrHandle() | privileged or some method to access /dev/cpu |
| | RW: /dev/mem | ? | msr is disabled | cpucounters.cpp:initUncoreObjects, pci.cpp:PCIHandleM() | privileged or some method to access /dev/cpu |
| | RO/RW: /sys/module/msr/parameters | PCM_NO_MSR | msr is disabled | msr.cpp:MsrHandle() | sysMount |
| | RW: /proc/bus/pci | PCM_USE_UNCORE_PERF | msr is disabled | pci.cpp:PCIHandle() | pciMount |
| | RO: /sys/firmware/acpi/tables/MCFG | PCM_USE_UNCORE_PERF | msr is disabled | pci.cpp:PciHandle::openMcfgTable() | mcfgMount |
| | energy | | | cpucounters.cpp initEnergyMonitoring() | |
- Indirect method uses Linux abstraction to access event counters (Linux Perf, resctrl) and run container in non-privileged mode.
- hostPort 9738 is exposed on host. (TODO: security review, consider TLS, together with Prometheus scrapping !!).
- Prometheus podMonitor is disabled (enabled it with --set podMonitor=true).

### Validation on local kind cluster


#### Requirements

- kubectl/kind/helm/jq binaries available in PATH
- docker service up and running
- kubectl/kind/helm/jq binaries available in PATH,
- docker service up and running.
- full set of metrics avaiable only bare-metal instance or Cloud .metal instance.

#### 1) Optionally mount resctrl filesystem
#### 1) (Optionally) mount resctrl filesystem (for RDT metrics)

```
mount -t resctrl resctrl /sys/fs/resctrl
```

#### 2) Create kind based Kubernetes cluster


```
kind create cluster
```

**Note** to be able to collect and test resctrl RDT metrics, kind cluster have to be created with additional mounts:

**Note** to be able to collect and test RDT metrics through resctrl filesystem, kind cluster have to be created with additional mounts:
```
nodes:
- role: control-plane
extraMounts:
- hostPath: /sys/fs/resctrl
containerPath: /sys/fs/resctrl
```
or (optionally), create kind cluster with local registry with [this script](https://kind.sigs.k8s.io/docs/user/local-registry/).
and apply the patch using sed:
e.g. create kind cluster with local registry with [this script](https://kind.sigs.k8s.io/docs/user/local-registry/).
and apply the patch to enable resctrl win following way:

```
wget https://kind.sigs.k8s.io/examples/kind-with-registry.sh
Expand All @@ -156,7 +121,10 @@ nodes:\
- hostPath: /sys/fs/resctrl\
containerPath: /sys/fs/resctrl\
' kind-with-registry.sh
```

Then create cluster using above patched script:
```
bash kind-with-registry.sh
```

Expand All @@ -170,8 +138,7 @@ Export kind kubeconfig as default for further kubectl commands:
kind export kubeconfig
```


#### 3) (Optionally) Deploy Node feature discovery
#### 3) (Optionally) Deploy Node Feature Discovery (nfd)

```
# I.a. Using Kustomize:
Expand All @@ -196,27 +163,23 @@ kubectl get sts prometheus-prometheus-kube-prometheus-prometheus

#### 5) Deploy PCM helm chart

Deploy with defaults:
```
# Deploy to current namespace with defaults
# a) Deploy to current namespace with defaults
helm install pcm .
# Alternatively deploy with NFD and with Prometheus enabled
# b) Alternatively deploy with NFD and/or with Prometheus enabled
helm install pcm . --set podMonitor=true
kubectl get podmonitor pcm
helm install pcm . --set nfd=true
# Alternatively deploy with NFD and with Prometheus enabled into own "pcm" namespace
# c) Alternatively deploy with NFD and with Prometheus enabled into own "pcm" namespace
helm install pcm . --namespace pcm
```

#### 6) Check metrics
#### 6) Check metrics are exported

Run proxy in background:
```
kubectl proxy &
# for access from another host TODO to be remove (unsecure!!!)
kubectl proxy --address 0.0.0.0 &
```

Access PCM metrics directly:
Expand All @@ -232,7 +195,7 @@ curl -Ls http://127.0.0.1:8001/api/v1/namespaces/default/pods/$podname/proxy/met
curl -Ls http://127.0.0.1:8001/api/v1/namespaces/default/pods/$podname/proxy/metrics | grep DRAM_Joules_Consumed # source: energy
```

or through Prometheus UI/prom tool:
or through Prometheus UI/prom tool (requires prometheus operator to be deployed and helm install with with `--set podMonitor=true`):
```
http://127.0.0.1:8001/api/v1/namespaces/default/services/prometheus-kube-prometheus-prometheus:http-web/proxy/graph
promtool query range --step 1m http://127.0.0.1:8001/api/v1/namespaces/default/services/prometheus-kube-prometheus-prometheus:http-web/proxy 'rate(DRAM_Writes{aggregate="system"}[5m])/1e9'
Expand Down Expand Up @@ -265,7 +228,7 @@ helm install pcm-vm . -f values-vm.yaml
helm install pcm-metal . -f values-metal.yaml
```

#### Direct as non-privileged container
#### Direct method as non-privileged container (not recommended)

**Note** PCM requires access to /dev/cpu device in read writer mode (MSR access) but it is no possible currently to mount devices in Kubernetes pods/containers in vanila Kubernetes. Please read this isses for more information https://github.com/kubernetes/kubernetes/issues/5607.

Expand Down Expand Up @@ -350,7 +313,39 @@ docker push localhost:5001/pcm-local
helm install pcm . -f values-local-image.yaml
```

##### Troubleshooting
#### Troubleshooting

##### Metric availability and requirements (devices/mounts/permissions)

| Method | Used interfaces | default | Notes |
|---------------|------------------------------------------------------------| -------- | ------------------------------------------------------------------------------------- |
| indirect | perf, resctrl | v | missing energy metrics, |
| direct | msr | | requires msr module and access to /dev/cpu (non trivial) or privileged access |


| Metrics | Available on Hardware | Available through interface | Available through method |
| --------------------- | ----------------------------- | ---------------------------- | ------------------------ |
| core | bare-metal, VM (any) | msr or perf | any |
| uncore (UPI) | bare-metal, VM (all sockets) | msr or perf | any |
| RDT (MBW,L3OCCUP) | bare-metal, VM (all sockets) | msr or resctrl | any |
| energy, temp | bare-metal (only) | msr | direct |
| perf-topdown | | perf only | indirect |


| Interface | Requirements | Controlled by (env/helm value) | default helm | Used by source code | Notes |
|---------------|------------------------------------------------------------|---------------------------------|-----------------------|----------------------------------------------------------|-----------------------------------------------------|
| perf | sys_perf_open() perf_paranoid<=0/privileged/CAP_ADMIN | PCM_NO_PERF | use perf | programPerfEvent(), PerfVirtualControlRegister() | |
| perf-uncore | sys_perf_open() perf_paranoid<=0/privileged/CAP_ADMIN | PCM_USE_UNCORE_PERF | use perf for uncore | programPerfEvent(), PerfVirtualControlRegister() | |
| perf-topdown | /sys/bus/event_source/devices/cpu/events | sysMount | yes | cpucounters.cpp:perfSupportsTopDown() | TODO: conflicts with sys/fs/resctrl |
| RDT | uses "msr" or "resctrl" interface | PCM_NO_RDT | yes | cpucounters.cpp:isRDTDisabled()/QOSMetricAvailable() | |
| resctrl | RW: /sys/fs/resctrl | PCM_USE_RESCTRL | yes | resctrl.cpp | resctrlHostMount |
| watchdog | RO/RW: /proc/sys/kernel/nmi_watchdog | PCM_KEEP_NMI_WATCHDOG | yes (tries to disable)| src/cpucounters.cpp:disableNMIWatchdog() | |
| msr | RW: /dev/cpu/X/msr + privileged or CAP_ADMIN/CAP_RAWIO | PCM_NO_MSR | msr is disabled | msr.cpp:MsrHandle() | privileged or some method to access /dev/cpu |
| | RW: /dev/mem | ? | msr is disabled | cpucounters.cpp:initUncoreObjects, pci.cpp:PCIHandleM() | privileged or some method to access /dev/cpu |
| | RO/RW: /sys/module/msr/parameters | PCM_NO_MSR | msr is disabled | msr.cpp:MsrHandle() | sysMount |
| | RW: /proc/bus/pci | PCM_USE_UNCORE_PERF | msr is disabled | pci.cpp:PCIHandle() | pciMount |
| | RO: /sys/firmware/acpi/tables/MCFG | PCM_USE_UNCORE_PERF | msr is disabled | pci.cpp:PciHandle::openMcfgTable() | mcfgMount |
| | energy | | | cpucounters.cpp initEnergyMonitoring() | |

One can replace pcm-sensor-server command and run pcm or sleep to investigate issue add following arguments when install helm chart
```
Expand Down
4 changes: 3 additions & 1 deletion deployment/pcm/values-metal.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#### ================ Tunning for VM ================
#### ================ Tunning for bare-metal instances ================
# with node-feature-discovery node affinity for non hypervisor and RDT
nmiWatchdogMount: false
PCM_NO_AWS_WORKAROUND: 1
PCM_KEEP_NMI_WATCHDOG: 0
nfd: true
nfdBaremetalAffinity: true
nfdRDTAffinity: true
5 changes: 3 additions & 2 deletions deployment/pcm/values-vm.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#### ================ Tunning for VM ================
nmiWatchdogMount: true
mcfgMount: false
PCM_NO_RDT: 1 # 0 - try to collect RDT data, enables local/remote memory bandwidth + llc occupancy
# Disable RDT because is not avaiable for VM instances
PCM_NO_RDT: 1
resctrlHostMount: false
Loading

0 comments on commit 263390c

Please sign in to comment.