Health check failings in /metrics categorized per subnet #1574
Closed
erwindassen
started this conversation in
Ideas
Replies: 2 comments 6 replies
-
Hmmm interesting. Currently the health checks allow a list of tags (not just the subnetID). Would there be an easy way to map that into the prometheus metrics format? |
Beta Was this translation helpful? Give feedback.
3 replies
-
How does #1579 look? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I wanted to open an issue with this request but since this is the first time I thought it would be more courteous to start a discussion first. I'll write here following the issue template to make it easier to move there after the discussion.
Context and scope
In v1.10.0 avalanchego added support for passing subnetIDs as argument to the
/health
endpoint and this was a very welcome addition. Unfortunately in the /metrics endpoints we still can only see the total of health checks that are failing via theavalanche_health_checks_failing
metric (although there are health checks specific to the P,X and C chains). I would like to propose to add metrics for health checks per subnet. For example by tags:This would allow infra to build custom alerts from prometheus metrics that identify issues with particular subnets. It is in general a good pattern to build alerts from prometheus which is solely responsible for scraping endpoints. In contrast, having to add custom logic to call JSON RPC methods to build alerts is in general an anti-pattern.
Discussion and alternatives
/health
endpoint.Open questions
None at the moment.
Beta Was this translation helpful? Give feedback.
All reactions