Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topolvm operator is failed to push monitoring metrics into prometheus #66

Open
GowthamShanmugam opened this issue Nov 15, 2021 · 17 comments · Fixed by #82
Open

topolvm operator is failed to push monitoring metrics into prometheus #66

GowthamShanmugam opened this issue Nov 15, 2021 · 17 comments · Fixed by #82

Comments

@GowthamShanmugam
Copy link
Contributor

topolvm operator is not pushing any monitoring metrics out of operator and nodes pod, There is no way to create an alert and alerting rules for topolvm in Kubernetes / Openshift.

@GowthamShanmugam
Copy link
Contributor Author

What is missing:

  • services for the service monitor to fetch the metrics
  • Roles and role binding

@little-guy-lxr
Copy link
Collaborator

@GowthamShanmugam service has added. see #63

@GowthamShanmugam
Copy link
Contributor Author

ack will test with latest master once again.

@leelavg
Copy link
Collaborator

leelavg commented Nov 22, 2021

@little-guy-lxr can you pls add commit #63 to origin-topolvm branch or can I raise the cherry-pick PR?

@little-guy-lxr
Copy link
Collaborator

topolvm

OK, I will cherry pick the commit to origin-topolvm

@little-guy-lxr little-guy-lxr linked a pull request Nov 23, 2021 that will close this issue
@GowthamShanmugam
Copy link
Contributor Author

GowthamShanmugam commented Nov 23, 2021

with the latest branch monitoring is not working, service monitoring is not created, do I need to create it manually? I can see everything is working fine when i am using alaudapublic/topolvm-operator:2.2.0. But with the main branch custom build it is not working.

@GowthamShanmugam
Copy link
Contributor Author

GowthamShanmugam commented Nov 23, 2021

Is there any reason we stopped calling EnableServiceMonitor function and CreateOrUpdatePrometheusRule?

1893632#diff-9a6acdebbd30f8b93285ecd76b832d3e4cd34cb58f06a4dbe292f1e849a3f332L263

@GowthamShanmugam
Copy link
Contributor Author

This Pr is fixing service monitoring and alerting rule creation but still metrics are not getting populated: #87

Metrics are coming only if i create namespace level role and role-binding

@little-guy-lxr
Copy link
Collaborator

@GowthamShanmugam How do you deploy topolvm operator. did you use the Yaml in https://github.com/alauda/topolvm-operator/tree/main/deploy/example ?

@GowthamShanmugam
Copy link
Contributor Author

yes, i used YAMLS

@little-guy-lxr
Copy link
Collaborator

@GowthamShanmugam please paste the log of topolvm operator. is your platform kubernetes/openshit ?

@GowthamShanmugam
Copy link
Contributor Author

GowthamShanmugam commented Nov 24, 2021

openshift, I saw metrics are getting populated while using alaudapublic/topolvm-operator:2.2.0 on openshift. But with the latest main branch not working. i will add logs.

@GowthamShanmugam
Copy link
Contributor Author

i checked with the latest master this issue is still there, i dont find logs which is related to metrics

2021-11-29 21:21:47.947364 D | status: node ip-10-0-142-55.ec2.internal, phase: Ready
2021-11-29 21:21:47.947390 D | status: no need to update cluster status
2021-11-29 21:21:47.947399 D | op-k8sutil: creating servicemonitor topolvm-service-monitor
W1129 21:21:47.947413       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2021-11-29 21:21:47.960589 D | op-k8sutil: creating prometheusRule topolvm-alert
W1129 21:21:47.960618       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.

@GowthamShanmugam
Copy link
Contributor Author

Prometheus log:

ts=2021-11-29T21:09:55.282Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"topolvm-system\""

ts=2021-11-29T21:09:55.283Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:449: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"topolvm-system\""

ts=2021-11-29T21:09:55.283Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:448: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"topolvm-system\""

@little-guy-lxr
Copy link
Collaborator

i checked with the latest master this issue is still there, i dont find logs which is related to metrics

2021-11-29 21:21:47.947364 D | status: node ip-10-0-142-55.ec2.internal, phase: Ready
2021-11-29 21:21:47.947390 D | status: no need to update cluster status
2021-11-29 21:21:47.947399 D | op-k8sutil: creating servicemonitor topolvm-service-monitor
W1129 21:21:47.947413       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2021-11-29 21:21:47.960589 D | op-k8sutil: creating prometheusRule topolvm-alert
W1129 21:21:47.960618       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.

@GowthamShanmugam check the ServiceMonitor created or not.

@little-guy-lxr
Copy link
Collaborator

Prometheus log:

ts=2021-11-29T21:09:55.282Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:447: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"topolvm-system\""

ts=2021-11-29T21:09:55.283Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:449: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"topolvm-system\""

ts=2021-11-29T21:09:55.283Z caller=level.go:63 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:448: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"topolvm-system\""

topolvm operator create servicemonitor in the own namesapce( this case is topolvm-system). but your prometheus may has no permission to access this namespace. Maybe ocp platform limit the user must create the Servicemonitor in the namespace that prometheus own. please check.

@GowthamShanmugam
Copy link
Contributor Author

You are right, Openshift Prometheus needs permission to access topolvm-system namespace. When I created role and role binding with all required permissions then it started working fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants