Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release the new Mimir Alertmanager #3752

Closed
13 of 14 tasks
Tracked by #3743
Rotfuks opened this issue Oct 29, 2024 · 7 comments
Closed
13 of 14 tasks
Tracked by #3743

Release the new Mimir Alertmanager #3752

Rotfuks opened this issue Oct 29, 2024 · 7 comments
Assignees
Labels
team/atlas Team Atlas

Comments

@Rotfuks
Copy link
Contributor

Rotfuks commented Oct 29, 2024

Motivation

We know how to roll out the Mimir Alertmanager - now we need to announce and do it.

Todo

  • Migration
    • Deploy Mimir Alertmanager on giantswarm installations
    • Create custom object storage on customer installations where needed
    • Deploy Mimir Alertmanager on customer installations
    • Switch alerting to Mimir Alertmanager on giantswarm installations
    • Switch alerting to Mimir Alertmanager on customer installations
    • Ensure everything runs smooth
      • Alerts are received on OpsGenie and Slack
      • Grafana Alerting shows rules, notifications policies, and active notifications
    • Disable old Alertmanager
  • Add automation in mc-bootstrap
  • Document the change in the architecture in the intranet
  • Annonuce the change to the mimir alert manager
    • Check if customers have alerts in place (Dominik can help :))

Outcome

We use the mimir alertmanager everywhere.

@github-project-automation github-project-automation bot moved this to Inbox 📥 in Roadmap Oct 29, 2024
@Rotfuks Rotfuks added the team/atlas Team Atlas label Oct 29, 2024
@Rotfuks Rotfuks changed the title Release the new Mimir Manager Release the new Mimir Alertmanager Oct 31, 2024
@TheoBrigitte TheoBrigitte self-assigned this Jan 16, 2025
@TheoBrigitte
Copy link
Member

TheoBrigitte commented Jan 20, 2025

Giantswarm installations deployment 🟠

Asked Panamax for object storage creation on tamarin and leopard https://gigantic.slack.com/archives/CE92C4BST/p1737368033954959:

@TheoBrigitte
Copy link
Member

TheoBrigitte commented Jan 20, 2025

Object storage configuration for panamax: https://github.com/giantswarm/panamax-configs/pull/56

Customer installations deployment 🟢

  • alba
  • alligator
  • armadillo
  • avocet
  • cedar
  • enigma
  • leopard
  • sardine
  • tamarin
  • violet
  • wallaby
  • whale

@TheoBrigitte
Copy link
Member

  • Mimir Alertmanager is now enable on all giantswarm installations
  • Usage doc is in intranet
  • Announcement to giantswarm folks was done in #news-product

Parking this issue for a week before releasing this to customer installations.

@TheoBrigitte
Copy link
Member

I checked manually OpsGenie and Slack and alerts are being received from giantswarm installations.
Releasing Mimir Alertmanager to all installations.

@TheoBrigitte
Copy link
Member

TheoBrigitte commented Jan 27, 2025

Deployment is ok for customer installations

Check list:

  • config loaded correctly (contains team_atlas_slack)
  • alerts count is greater than 0
  • silences count is greater than 0
  • olly-op config configured with mimir-alertmanager url
  • mimir ruler configured with mimir-alertmanager url
  • silences-operator configured with mimir-alertmanager-url

Image

the script
#!/bin/bash

source ~/projects/bash-magic/colors.sh

check_alertmanager() {
  kubectl -n=mimir port-forward po/mimir-alertmanager-0 8080 1>/dev/null &
  port_forward_pid=$!
  trap "kill $port_forward_pid" EXIT
  sleep 2

  alertmanager_config_ok=false
  if mimirtool alertmanager get --id anonymous --address http://localhost:8080|grep -q team_atlas_slack; then
    alertmanager_config_ok=true
  fi

  alerts_present=false
  if [ $(curl -SsH'X-Scope-OrgID: anonymous' localhost:8080/alertmanager/api/v2/alerts|jq -r '.|length') -gt 0 ]; then
    alerts_present=true
  fi

  silences_present=false
  if [ $(curl -SsH'X-Scope-OrgID: anonymous' localhost:8080/alertmanager/api/v2/silences|jq -r '.|length') -gt 0 ]; then
    silences_present=true
  fi

  olly_ok=false
  if kubectl -n=monitoring get deploy observability-operator -oyaml|grep -qE -- '- --alertmanager-enabled=true|- --alertmanager-secret-name=observability-operator-alertmanager|- --alertmanager-url=http://mimir-alertmanager-headless.mimir.svc:8080'; then
    olly_ok=true
  fi

  mimir_ok=false
  if helm -n=mimir get manifest mimir|grep -q 'alertmanager_url: dnssrvnoa+http://_http-metrics._tcp.mimir-alertmanager-headless.mimir.svc.cluster.local./alertmanager'; then
    mimir_ok=true
  fi

  silence_operator_ok=false
  if kubectl -n=monitoring get cm silence-operator -oyaml|grep -qE -- 'address: http://mimir-alertmanager-headless.mimir.svc:8080/alertmanager|tenantId: anonymous'; then
    silence_operator_ok=true
  fi

  if $alertmanager_config_ok && $olly_ok && $mimir_ok && $silence_operator_ok && $alerts_present && $silences_present; then
    echo -e "${GREEN}OK$NC"
  else
    echo -e "${RED}FAIL$NC"
  fi

  kill $port_forward_pid
}

list_capi_installation() {
  opsctl list installations|grep -v 'giantswarm'|grep -E 'capa|capz|cloud-director|vsphere' |awk '{print $1}'
}

for i in $(list_capi_installation); do
  echo "$i "
  gx "$i" &>/dev/null
  echo -n "  verify: "
  check_alertmanager
done

@TheoBrigitte
Copy link
Member

TheoBrigitte commented Jan 28, 2025

Alerts are received on Slack (#alert) and OpsGenie

Image
https://giantswarm.app.opsgenie.com/reports/main

@TheoBrigitte
Copy link
Member

Alerts are coming in OpsGenie as usual

Image

Alerts are coming on Slack

Grafana correctly shows rules, notifications policies, and active notifications

@github-project-automation github-project-automation bot moved this from Inbox 📥 to Validation ☑️ in Roadmap Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/atlas Team Atlas
Projects
Status: Validation ☑️
Development

No branches or pull requests

2 participants