Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Mimir / Loki Rules Sync Support #568

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

bentonam
Copy link
Collaborator

Resolves #564

@bentonam bentonam added the enhancement New feature or request label Jun 10, 2024
@petewall
Copy link
Collaborator

I might resurrect this, but I think I might combine alloy-events and alloy-rules into a new alloy-singleton for this purpose.

Signed-off-by: Pete Wall <pete.wall@grafana.com>
Signed-off-by: Pete Wall <pete.wall@grafana.com>
Signed-off-by: Pete Wall <pete.wall@grafana.com>
Signed-off-by: Pete Wall <pete.wall@grafana.com>
@luong-komorebi
Copy link

I am curious about the progress of this and whether this will be supported in the near future. Right now there is almost no option to use prometheus rule crd for alerting with a k8s-monitoring and mimir setup. Plus, it is also hard to manage mimir rules inside helm chart, which makes this PR important

@stefanandres
Copy link
Contributor

stefanandres commented Sep 2, 2024

@luong-komorebi
As workaround we've deployed another dedicated alloy helm deployment without the k8s-monitoring chart for that:
(argocd appset)

        - repoURL: https://grafana.github.io/helm-charts
          chart: alloy
          targetRevision: 0.6.1
          helm:
            releaseName: alloy-prometheusrules
            ignoreMissingValueFiles: true
            valueFiles:
              - $values/argocd-apps/clusters/monitoring/k8s-monitoring/values/prometheusrules.yaml
              - $values/argocd-apps/clusters/monitoring/k8s-monitoring/values/prometheusrules-{{ metadata.labels.environment }}.yaml
            valuesObject:
              agent:
                extraEnv:
                  - name: CLUSTER
                    value: "{{ metadata.labels.environment }}"

prometheusrules.yaml

❯ cat prometheusrules.yaml
# Needed to change the selectorLabels
nameOverride: alloy-prometheusrules

crds:
  create: false

alloy:
  enableReporting: false

  resources:
    requests:
      cpu: 10m
      memory: 50Mi
    limits:
      memory: 100Mi

  configMap:
    content: |
        remote.kubernetes.secret "basic_auth_prometheus" {
          name = "basic-auth-prometheus"
          namespace = "monitoring"
        }

        mimir.rules.kubernetes "mimir_ruler" {
          address = nonsensitive(remote.kubernetes.secret.basic_auth_prometheus.data["host"])

          basic_auth {
            username = nonsensitive(remote.kubernetes.secret.basic_auth_prometheus.data["user"])
            password = remote.kubernetes.secret.basic_auth_prometheus.data["password"]
          }

          mimir_namespace_prefix = env("CLUSTER")
        }

controller:
  type: deployment
  updateStrategy:
    type: Recreate
  extraAnnotations:
    reloader.stakater.com/auto: "true"
  tolerations:
  - key: "spot-arm64"
    value: "true"
    operator: "Equal"
    effect: "NoSchedule"

serviceMonitor:
  enabled: true

That's working perfectly with mimir and prometheusrules from each cluster, but I appreciate having this integrated into k8s-monitoring directly 🤩

@luong-komorebi
Copy link

@stefanandres awesome to hear your suggested workaround. Thank you for that

While I would love to discuss more on different ways to avoid this blocker, this PR would become offtopic.
As a last message here, I will provide one of my workaround here as well, in order to prove the interest for this work
I have to resort to mimirtool with terraform using https://github.com/ovh/terraform-provider-mimirtool and providing the yamls inside terraform
Since I am already using terraform to manage helm release of mimir, that is one of the shortest way I can think of, besides running mimirtool in a local-exec.
However, this solution is not ideal due to extra component it requires, plus I manage multiple clusters and would love to have Mimir / Loki Rules as close to where they are needed as possible. Thus, this work when merged would open a more intuitive door for provision mimir and loki rules

@petewall petewall self-requested a review as a code owner October 4, 2024 18:13
@xinnjie
Copy link

xinnjie commented Nov 17, 2024

@luong-komorebi As workaround we've deployed another dedicated alloy helm deployment without the k8s-monitoring chart for that: (argocd appset)

        - repoURL: https://grafana.github.io/helm-charts
          chart: alloy
          targetRevision: 0.6.1
          helm:
            releaseName: alloy-prometheusrules
            ignoreMissingValueFiles: true
            valueFiles:
              - $values/argocd-apps/clusters/monitoring/k8s-monitoring/values/prometheusrules.yaml
              - $values/argocd-apps/clusters/monitoring/k8s-monitoring/values/prometheusrules-{{ metadata.labels.environment }}.yaml
            valuesObject:
              agent:
                extraEnv:
                  - name: CLUSTER
                    value: "{{ metadata.labels.environment }}"

prometheusrules.yaml

❯ cat prometheusrules.yaml
# Needed to change the selectorLabels
nameOverride: alloy-prometheusrules

crds:
  create: false

alloy:
  enableReporting: false

  resources:
    requests:
      cpu: 10m
      memory: 50Mi
    limits:
      memory: 100Mi

  configMap:
    content: |
        remote.kubernetes.secret "basic_auth_prometheus" {
          name = "basic-auth-prometheus"
          namespace = "monitoring"
        }

        mimir.rules.kubernetes "mimir_ruler" {
          address = nonsensitive(remote.kubernetes.secret.basic_auth_prometheus.data["host"])

          basic_auth {
            username = nonsensitive(remote.kubernetes.secret.basic_auth_prometheus.data["user"])
            password = remote.kubernetes.secret.basic_auth_prometheus.data["password"]
          }

          mimir_namespace_prefix = env("CLUSTER")
        }

controller:
  type: deployment
  updateStrategy:
    type: Recreate
  extraAnnotations:
    reloader.stakater.com/auto: "true"
  tolerations:
  - key: "spot-arm64"
    value: "true"
    operator: "Equal"
    effect: "NoSchedule"

serviceMonitor:
  enabled: true

That's working perfectly with mimir and prometheusrules from each cluster, but I appreciate having this integrated into k8s-monitoring directly 🤩

@stefanandres
Hello! Your idea is great.

I am following your steps delpoying an alloy instance with configmap looks this:

          configMap:
            content: |
              remote.kubernetes.secret "basic_auth_prometheus" {
                name = "prometheus-k8s-monitoring"
                namespace = "monitoring"
              }
              
              mimir.rules.kubernetes "mimir_ruler" {
                address = nonsensitive(remote.kubernetes.secret.basic_auth_prometheus.data["host"])
                basic_auth {
                  username = nonsensitive(remote.kubernetes.secret.basic_auth_prometheus.data["username"])
                  password = remote.kubernetes.secret.basic_auth_prometheus.data["password"]
                }
              }

but encounter error ts=2024-11-17T17:46:34.367839938Z level=error msg="failed to list rules from mimir" component_path=/ component_id=mimir.rules.kubernetes.mimir_ruler err="error GET /prometheus/config/v1/rules: unrecoverable error response: server returned HTTP status 401 Unauthorized: {\"status\":\"error\",\"error\":\"authentication error: invalid scope requested\"}"
The secret monitoring/prometheus-k8s-monitoring is deployed by k8s-monitoring chart to connect grafana cloud mimir instace.

Is there any accesss control I should set on Grafana Cloud or I am setting wrong configmap?

Thanks in advance.

The configmap:

> k get secret -n monitoring prometheus-k8s-monitoring -o yaml
apiVersion: v1
data:
  host: aHR0cHM6Ly9wcm9tZXRoZXVzLXByb2QtMzctcHJvZC1hcC1zb3V0aGVhc3QtMS5ncmFmYW5hLm5ldA==
  password: XXXXX
  username: XXXXX
kind: Secret
metadata:
  labels:
    app.kubernetes.io/instance: grafana-cloud
  name: prometheus-k8s-monitoring
  namespace: monitoring
type: Opaque

@xinnjie
Copy link

xinnjie commented Nov 20, 2024

Figured out my problem.

It turn out to be problem about permission, as the error message authentication error: invalid scope requested said.

For Grafana Cloud users, it's needed to go to https://grafana.com/orgs/$YOUR_PROJECT/access-policies, add a token with rules edit and read permission.
image

Alloy config should be like this:

          configMap:
            content: |
              remote.kubernetes.secret "rule_service" {
                name = "monitoring-secrets"
                namespace = "monitoring"
              }
              
              mimir.rules.kubernetes "mimir_ruler" {
                address = nonsensitive(remote.kubernetes.secret.rule_service.data["host"])
                prometheus_http_prefix = "/api/prom"
                basic_auth {
                  username = nonsensitive(remote.kubernetes.secret.rule_service.data["username"])
                  password = remote.kubernetes.secret.rule_service.data["rule_sync_token"]
                }
              }

The secret monitoring/monitoring-secrets should have a filed rule_sync_token filled with the token generated with rules edit and read permission. You can change names of course. The mimir.rules.kubernetes.mimir_ruler.address is Grafana Cloud Mimir Instace address, like https://prometheus-prod-37-prod-ap-southeast-1.grafana.net.

Hope it could help others encounter same problems like me.

@petewall
Copy link
Collaborator

Perhaps a doc about "using Grafana Cloud" and the various features and the access policy tokens would be useful...

@bentonam
Copy link
Collaborator Author

I'll work on getting this updated to work w/ the v2 structure and alloy-singleton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Support for Syncing Rule Objects
5 participants