Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add prometheus alerts in support bundle #94

Merged
merged 2 commits into from
Jul 15, 2024

Conversation

Yu-Jack
Copy link
Collaborator

@Yu-Jack Yu-Jack commented Jan 25, 2024

Related Issue

harvester/harvester#4993

Solution

For the first version of this feature, focus on fetching current alert.

I think If we fetch too many alerts here, it might be a problem for us debugging because it's too much and not easy to be queried. So, I just fetch current alert and format it.

Test Plan

Case 1. Generate support bundle without enabling rancher-monitoring, it should succeed as well, excluding prometheus-alerts.json.
Case 2. Generate support bundle with enable rancher-monitoring, there should be a file called prometheus-alerts.json in first layer of directory.

Result

Sample, it only shows pending and firing state alert
[
	{
		"activeAt": "2024-01-23T06:59:00Z",
		"Annotations": {
			"description": "100% of the rancher/rancher targets in cattle-system namespace are down.",
			"runbook_url": "https://runbooks.prometheus-operator.dev/runbooks/general/targetdown",
			"summary": "One or more targets are unreachable."
		},
		"Labels": {
			"alertname": "TargetDown",
			"job": "rancher",
			"namespace": "cattle-system",
			"service": "rancher",
			"severity": "warning"
		},
		"State": "firing",
		"Value": "1e+02"
	},
	{
		"activeAt": "2024-01-24T07:37:00.510363907Z",
		"Annotations": {
			"description": "This is an alert meant to ensure that the entire alerting pipeline is functional.\nThis alert is always firing, therefore it should always be firing in Alertmanager\nand always fire against a receiver. There are integrations with various notification\nmechanisms that send a notification when this alert is not firing. For example the\n\"DeadMansSnitch\" integration in PagerDuty.\n",
			"runbook_url": "https://runbooks.prometheus-operator.dev/runbooks/general/watchdog",
			"summary": "An alert that should always be firing to certify that Alertmanager is working properly."
		},
		"Labels": {
			"alertname": "Watchdog",
			"severity": "none"
		},
		"State": "firing",
		"Value": "1e+00"
	},
        // ignore others for reading...
]

@Yu-Jack Yu-Jack force-pushed the feat-4993 branch 2 times, most recently from 5380bd3 to 924ca40 Compare January 25, 2024 07:31
@Yu-Jack Yu-Jack marked this pull request as ready for review January 25, 2024 07:56
@Yu-Jack Yu-Jack force-pushed the feat-4993 branch 2 times, most recently from ad5ffec to d245802 Compare January 25, 2024 08:16
@Yu-Jack Yu-Jack self-assigned this Jan 25, 2024
@Yu-Jack Yu-Jack requested a review from bk201 January 25, 2024 08:36
@Yu-Jack Yu-Jack closed this May 8, 2024
@Yu-Jack Yu-Jack reopened this May 8, 2024
@@ -209,6 +213,50 @@ func (m *SupportBundleManager) phaseCollectClusterBundle() error {
return nil
}

func (m *SupportBundleManager) phaseCollectPrometheusBundle() error {
pods, err := m.k8s.GetPodsListByLabels("cattle-monitoring-system", "app.kubernetes.io/name=prometheus")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c3y1huang Any objection to this feature? this feature adds a phase and checks as if the cluster has a Prometheus pod (especially run in the cattle-monitoring-system ns). If yes, it will try to extract the current alerts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections since it's non-blocking (optionalPhase). Additionally, Longhorn could potentially benefit from this.

pkg/manager/manager.go Outdated Show resolved Hide resolved
@bk201 bk201 requested a review from c3y1huang July 12, 2024 09:18
pkg/manager/manager.go Show resolved Hide resolved
vendor/modules.txt Outdated Show resolved Hide resolved
@Yu-Jack Yu-Jack force-pushed the feat-4993 branch 2 times, most recently from d108fcc to dbe6231 Compare July 15, 2024 05:48
Signed-off-by: Jack Yu <jack.yu@suse.com>
Signed-off-by: Jack Yu <jack.yu@suse.com>
@c3y1huang c3y1huang merged commit d3b4eee into rancher:master Jul 15, 2024
1 check passed
@Yu-Jack Yu-Jack deleted the feat-4993 branch July 15, 2024 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants