Skip to content

Commit

Permalink
Feature/ccv2 common branch (#353)
Browse files Browse the repository at this point in the history
* set filterExternalLabels to false

* comment remote read temporarily

* enable remote read

* # filterExternalLabels: false
comment out

* remove filter external label

* Changes for ms

* gitlab seed token

* Refresh tempaltes

* Change in gitlabci

* eks netbird integration

* managed_external_name_map

* Enable Ceph debugging pod deployment

* added cloudwatch exporter initial config

* added dashboards for ec2 and rds

* template data

* make cloudwatch exporter config using yaml file

* adding netbird inputs for eks

* add exporter

update labels

remove tmp

update paths

added resources for exporter

remove subpath

* template path

* tempalate

* add credentials

* add port in container definition

* use named port in targetPort

* add servce monitor

* use 60sec interval for scrapping

* add scrap_job label

* change indentation for labels

* remove cloud watch exporter

* Eks oidc

* fetch cloud watch credentails from vault

* use vault secret store

* fix secret store ref

* fix the secret

* fix relabel config

* remove external secret

* Revert "remove external secret"

This reverts commit ba0ed04.

* add aws ec2 dashboard

* adding changes

* adding the missing change

* Removing unwanted file

* oidc config key

* add average NetworkIn metric

* formatting

* add sum to NetworkIn

* add NetworkOut

* commenting out

* add disk i/o ops

* added status check metrics

* fix aws dashboards foldername

* fix grafana dashboard name

* netbird installation changes

* wireguard healthcheck port

* renamed ec2 and rds dashboards

* added ebs dashboard

* added rds and ebs dashboards

* add exporterd tag on metrics

* fix cloud watch exporter config

* monitor another VM

* remove exported tags

* only pull InstanceId metric

* add EBS metrics

* include eks also for ms access

* add nftables flag

* add cloudwatch billing dashboard

* added cloudwatch requests per min dashboards

* added cloudwatch billing dashboard in git

* revert nft env change

* comment out duplicated dashboard

* add back aws-cloudwatch-billing

* use dynamic tag for aws dashboards

* use dynamic tag for all mimir dashboards

* update dashboard

* try adding pre bootstrap with yum install

* try configuring ipvs mode for kube-proxy add one

* added more charts

* revert kube-proxy ipvs change

* change poll to 60m

* added cloudwatch integration documentation

* increasing the objects limit

* fix

* change in refresh templates

* add cc cidr block for snat env var on eks vpc cni

* missed group var

* add cidr block var

* add service account for cloudwatch exporter

* add namespace to service account

* rename cloudwatch exporter SA

* add role annotation

* remove hardcoded role

* setting AWS_VPC_K8S_CNI_EXTERNALSNAT to true

* backtunnel access

* cni config

* eks k8s version

* netbird route

* Add data source for cloud-init and bastion instances

* add bastion launch template and asg resource

* Output required bastion information in base-k8s module

* define vars for bastion asg

* Output required bastion information in base-k8s module

* Output required bastion information in eks module

* Output required bastion information in managed svs module

* change in route

* chnage in backtunnel route

* remove route53 for bastion

* netbird provider

* netbird disable rotation

* remove commented code not required

* ami

* change config for new dev tf version with rotation

* add example policy for development

* Revert "add example policy for development"

This reverts commit 57b356a.

* change in msgw route

* correction

* update cloudwatch architecture doc

* added RDS configs

* update region

* removing poll 60m

* update cloud-watch-integration-arch

* use max for read and write latency

* comment out EC2 metrics available via node-exporter

* comment out cloudwatch exporter

* comment out dashboards

* pr corrections

---------

Co-authored-by: muzammil360 <muzammil360@gmail.com>
Co-authored-by: Josphat Mutai <josphatkmutai@gmail.com>
Co-authored-by: David Fry <david.fry@modusbox.com>
  • Loading branch information
4 people authored Sep 24, 2024
1 parent 400517f commit d59f9dc
Show file tree
Hide file tree
Showing 55 changed files with 3,407 additions and 95 deletions.
496 changes: 496 additions & 0 deletions assets/grafana-dashboards/aws-cloudwatch/cloudwatch-billing.json

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions docs/monitoring/cloudwatch-integraton-architecture.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
30 changes: 30 additions & 0 deletions docs/monitoring/cloudwatch-integraton.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Problem
Some components of the Mojaloop software may operate as AWS managed services, which report their metrics to AWS CloudWatch. The operations team needs these performance and health metrics to be available in a centralized Grafana monitoring dashboard.

# Solution
The solution involves retrieving metrics from CloudWatch to evaluate the performance and health of AWS managed services, and integrating them into Prometheus. Grafana dashboards can then query these metrics from Prometheus.

The diagram below depicts the architecture of the proposed system.

![diagram](./cloudwatch-integraton-architecture.svg)

# Implementation details

## Exporter options
Two options are available.
1. [CloudWatch Exporter](https://github.com/prometheus/cloudwatch_exporter/)
2. [YACE - yet another cloudwatch exporter](https://github.com/nerdswords/yet-another-cloudwatch-exporter)

We chose the second option, YACE, because it includes a mixin with prebuilt dashboards for various services like EC2, EBS, S3, and RDS, which reduces the setup effort.

## Authentication
Cloudwatch exporter needs to authenticate with the AWS cloudwatch API. YACE uses AWS SDK for Go enabling us to authenticate via [AWS's default credential chain](https://aws.github.io/aws-sdk-go-v2/docs/configuring-sdk/#specifying-credentials). We have two relevant options


1. Expose credentails as environment variables
2. Associate an AWS IAM policy with the exporter pod

Option 1 uses long-lived static credentials, while Option 2 enables short-lived, more secure authentication tokens. Currently, we are using Option 1 to accelerate development.

## Target Discovery
YACE can discover and filter resource targets based on tags. To maintain consistency, we should use the `monitoring_enabled:true` tag on all resources that need to be monitored.
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,12 @@ spec:
value: "${ARGOCD_ENV_k8s_admin_rbac_group}"
- key: "k8s_user_rbac_group"
value: "${ARGOCD_ENV_k8s_user_rbac_group}"
- key: managed_services_env_cidr
value: "${ARGOCD_ENV_managed_services_env_cidr}"
- key: managed_svc_enabled
value: "${ARGOCD_ENV_managed_svc_enabled}"
- key: "k8s_cluster_type"
value: "${ARGOCD_ENV_k8s_cluster_type}"

# All Terraform outputs are written to the connection secret.
providerConfigRef:
Expand Down Expand Up @@ -140,6 +146,10 @@ spec:
source = "hashicorp/kubernetes"
version = "2.31.0"
}
time = {
source = "hashicorp/time"
version = "0.12.1"
}
}
}
Expand Down Expand Up @@ -169,3 +179,5 @@ spec:
token = var.token
base_url = "https://${ARGOCD_ENV_gitlab_fqdn}"
}
provider "time" {}
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ spec:
- key: argocd_namespace
value: ${ARGOCD_ENV_argocd_namespace}
- key: kubernetes_oidc_groups_claim
value: ${ARGOCD_ENV_kubernetes_oidc_groups_claim}
value: ${ARGOCD_ENV_kubernetes_oidc_groups_claim}
- key: cc_cidr_block
value: ${ARGOCD_ENV_cc_cidr_block}
# All Terraform outputs are written to the connection secret.
providerConfigRef:
name: env-config
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# apiVersion: grafana.integreatly.org/v1beta1
# kind: GrafanaFolder
# metadata:
# name: aws-managed-services
# spec:
# instanceSelector:
# matchLabels:
# dashboards: "grafana"
# ---
# apiVersion: grafana.integreatly.org/v1beta1
# kind: GrafanaDashboard
# metadata:
# name: aws-ec2
# spec:
# folder: aws-managed-services
# datasources:
# - inputName: "DS_PROMETHEUS"
# datasourceName: "${ARGOCD_ENV_dashboard_datasource_name}"
# instanceSelector:
# matchLabels:
# dashboards: "grafana"
# url: https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/aws-ec2.json
# ---
# apiVersion: grafana.integreatly.org/v1beta1
# kind: GrafanaDashboard
# metadata:
# name: aws-rds
# spec:
# folder: aws-managed-services
# datasources:
# - inputName: "DS_PROMETHEUS"
# datasourceName: "${ARGOCD_ENV_dashboard_datasource_name}"
# instanceSelector:
# matchLabels:
# dashboards: "grafana"
# url: https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/aws-rds.json
# ---
# apiVersion: grafana.integreatly.org/v1beta1
# kind: GrafanaDashboard
# metadata:
# name: aws-ebs
# spec:
# folder: aws-managed-services
# datasources:
# - inputName: "DS_PROMETHEUS"
# datasourceName: "${ARGOCD_ENV_dashboard_datasource_name}"
# instanceSelector:
# matchLabels:
# dashboards: "grafana"
# url: https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/aws-ebs.json
# ---
# apiVersion: grafana.integreatly.org/v1beta1
# kind: GrafanaDashboard
# metadata:
# name: aws-cloudwatch-billing
# spec:
# folder: aws-managed-services
# datasources:
# - inputName: "DS_PROMETHEUS"
# datasourceName: "${ARGOCD_ENV_dashboard_datasource_name}"
# instanceSelector:
# matchLabels:
# dashboards: "grafana"
# url: https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/assets/grafana-dashboards/aws-cloudwatch/cloudwatch-billing.json
# ---
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-overview.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-overview.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -28,7 +28,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-overview-networking.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-overview-networking.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -39,7 +39,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-overview-resources.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-overview-resources.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -50,7 +50,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-queries.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-queries.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -61,7 +61,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-reads.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-reads.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -72,7 +72,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-reads-networking.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-reads-networking.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -83,7 +83,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-reads-resources.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-reads-resources.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -94,7 +94,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-writes.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-writes.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -105,7 +105,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-writes-networking.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-writes-networking.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -116,7 +116,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-writes-resources.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-writes-resources.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -127,7 +127,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-compactor.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-compactor.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -138,7 +138,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-object-store.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-object-store.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -149,7 +149,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-rollout-progress.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-rollout-progress.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -160,7 +160,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-scaling.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-scaling.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -171,7 +171,7 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-slow-queries.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-slow-queries.json"
---
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
Expand All @@ -182,5 +182,5 @@ spec:
instanceSelector:
matchLabels:
dashboards: "grafana"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/50d8293f5d18022cb3756420195a21e0adcbfc34/monitoring-mixin/build/mimir-tenants.json"
url: "https://raw.githubusercontent.com/mojaloop/iac-modules/${ARGOCD_ENV_monitoring_application_gitrepo_tag}/monitoring-mixin/build/mimir-tenants.json"
---
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ resources:
- virtual-service.yaml
- vault-secrets.yaml
- grafana-oidc-xplane-terraform.yaml
# - dashboards-aws-managed-svs.yaml
- dashboards-default.yaml
- dashboards-k8s.yaml
- dashboards-kafka.yaml
Expand Down
101 changes: 101 additions & 0 deletions gitops/applications/base/monitoring/cloudwatch-exporter-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# apiVersion: v1alpha1
# sts-region: us-west-2 # TODO: do not hardcode. Understand what it is
# discovery:
# jobs:
# - type: AWS/EC2
# regions: [us-west-2] # TODO: do not hardcode, understand what it is
# includeContextOnInfoMetrics: true
# searchTags:
# - key: Name
# value: "Forem to Orbit bridge"
# dimensionNameRequirements:
# - InstanceId
# period: 300
# length: 300
# metrics:
# - name: CPUUtilization
# statistics: [Maximum]
# # - name: NetworkIn
# # statistics: [Average, Sum]
# # - name: NetworkOut
# # statistics: [Average, Sum]
# # - name: NetworkPacketsIn
# # statistics: [Sum]
# # - name: NetworkPacketsOut
# # statistics: [Sum]
# # - name: DiskReadBytes
# # statistics: [Sum]
# # - name: DiskWriteBytes
# # statistics: [Sum]
# # - name: DiskReadOps
# # statistics: [Sum]
# # - name: DiskWriteOps
# # statistics: [Sum]
# - name: StatusCheckFailed
# statistics: [Sum]
# - name: StatusCheckFailed_Instance
# statistics: [Sum]
# - name: StatusCheckFailed_System
# statistics: [Sum]
# - type: AWS/EBS
# regions: [us-west-2] # TODO: do not hardcode, understand what it is
# includeContextOnInfoMetrics: true
# searchTags: # update the search tag later
# - key: Name
# value: forem-community.mojaloop.io
# dimensionNameRequirements:
# - VolumeId
# period: 300
# length: 300
# metrics:
# - name: VolumeReadBytes
# statistics: [Sum]
# - name: VolumeWriteBytes
# statistics: [Sum]
# - name: VolumeReadOps
# statistics: [Average]
# - name: VolumeWriteOps
# statistics: [Average]
# - name: VolumeIdleTime
# statistics: [Average]
# - name: VolumeTotalReadTime
# statistics: [Average]
# - name: VolumeTotalWriteTime
# statistics: [Average]
# - name: VolumeQueueLength
# statistics: [Average]
# - name: BurstBalance
# statistics: [Average]
# - type: AWS/RDS
# regions: [eu-west-1] # TODO: do not hardcode, understand what it is
# includeContextOnInfoMetrics: true
# searchTags: # update the search tag later
# - key: mojaloop/owner
# value: Samuel-Kummary # TODO: update target tags
# dimensionNameRequirements:
# - DBInstanceIdentifier
# period: 300
# length: 300
# metrics:
# - name: CPUUtilization
# statistics: [Maximum]
# - name: CPUUtilization
# statistics: [Maximum]
# - name: DatabaseConnections
# statistics: [Sum]
# - name: FreeStorageSpace
# statistics: [Average]
# - name: FreeableMemory
# statistics: [Average]
# - name: ReadThroughput
# statistics: [Average]
# - name: WriteThroughput
# statistics: [Average]
# - name: ReadIOPS
# statistics: [Average]
# - name: WriteIOPS
# statistics: [Average]
# - name: ReadLatency
# statistics: [Maximum]
# - name: WriteLatency
# statistics: [Maximum]
Loading

0 comments on commit d59f9dc

Please sign in to comment.