Monitoring OpenShift with Prometheus, Node-Exporter, Kube-State-Metrics / visualization by Grafana
This project was build upon the following components:
- OpenShift 3.7.1
- Prometheus 2.1.0
- Grafana 5.0.0-beta4
- Kube-state-metrics 1.1.0
- Node-Exporter 0.15.2
minishift start --memory 8GB --openshift-version v3.7.1
oc adm policy add-cluster-role-to-user cluster-admin developer
see https://github.com/wkulhanek/openshift-prometheus/tree/master/node-exporter
Because we set some cluster-reader role-bindings, we need to login as a cluster-admin:
oc login -u system:admin
In the next step we instantiate the project, in this case "monitoring".
oc new-project monitoring
Next we import readymade Grafana-dashboards into a ConfigMap:
oc create configmap dashboards-grafana --from-file=dashboards/
In the next step we create the monitoring components based on a template:
oc new-app -p NAMESPACE=monitoring -f monitoring-template.yaml
Alternative command:
oc process -p NAMESPACE=monitoring -f monitoring-template.yaml | oc apply -f -
After this step, you'll have Prometheus, Grafana and kube-state-metrics running.
The optional node-exporter component may be installed as a daemon set to gather host level metrics. It requires additional privileges to view the host and should only be run in administrator controlled namespaces.
Without deleteing or changing the limits, sometimes not all nodes can be scheduled.
# oc export limits default-limits -o yaml > default_limits.yaml
oc delete limitrange default-limits
# oc export quota default-quota -o yaml > deault_quota.yaml
oc delete quota default-quota
oc annotate ns monitoring openshift.io/node-selector= --overwrite
oc create -f node-exporter.yaml
oc adm policy add-scc-to-user -z prometheus-node-exporter -n monitoring hostaccess
Get the routes to open Grafana and Prometheus in the browser:
oc get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
grafana grafana-monitoring.192.168.64.5.nip.io grafana 3000 None
prometheus prometheus-monitoring.192.168.64.5.nip.io prometheus 9090 None
Grafana default login credentials:
- root
- secret
# show events sorted
oc get events --sort-by='.lastTimestamp'
# show cluster wide objects
oc get ds --all-namespaces
# wide output
oc get pods -o wide
# poor man's oc dashboard
watch -n 1 "echo '###pods';oc get pods; echo '###dc'; oc get dc; echo '###ds'; oc get ds; echo '###svc'; oc get svc; echo '###configmap'; oc get configmap; echo '###routes'; oc get routes"
# access minishift pv's
minishift ssh
sudo su -
cd /var/lib/minishift/openshift.local.pv
- k-s-m does not seem to export deployment metrics on openshift
- k-s-m does not provide kube_resourcequota used in dashboard "Cluster overview"
- check rule evaluation and its graph on the prometheus dashboard
- use StatefulSet ?
- livenessProbe and/or readinessProbe
- Add rules from https://github.com/coreos/prometheus-operator/tree/master/contrib/kube-prometheus/assets/prometheus/rules
- document template variables in readme
- add remote_write for prometheus?
- can storage limits/requests be calculated?
- add alertmanager or add parameter to specify an alertmanager url?
- parameters for grafana credentials?
- redeploy on configmap change (maybe https://github.com/aabed/kubernetes-configmap-rollouts) or https://github.com/coreos/prometheus-operator/tree/master/contrib/prometheus-config-reloader or https://github.com/jimmidyson/configmap-reload
- easier configmap changes
- oauth proxies for grafana/prometheus