Skip to content

Commit

Permalink
Add unacked message alert section (#104)
Browse files Browse the repository at this point in the history
Signed-off-by: Nghia Tran <tcnghia@gmail.com>
  • Loading branch information
tcnghia authored Feb 1, 2024
1 parent 5d86775 commit c1c46e2
Show file tree
Hide file tree
Showing 8 changed files with 99 additions and 19 deletions.
2 changes: 1 addition & 1 deletion modules/cloudevent-recorder/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ No requirements.
| <a name="input_provisioner"></a> [provisioner](#input\_provisioner) | The identity as which this module will be applied (so it may be granted permission to 'act as' the DTS service account). This should be in the form expected by an IAM subject (e.g. user:sally@example.com) | `string` | n/a | yes |
| <a name="input_regions"></a> [regions](#input\_regions) | A map from region names to a network and subnetwork. A recorder service and cloud storage bucket (into which the service writes events) will be created in each region. | <pre>map(object({<br> network = string<br> subnet = string<br> }))</pre> | n/a | yes |
| <a name="input_retention-period"></a> [retention-period](#input\_retention-period) | The number of days to retain data in BigQuery. | `number` | n/a | yes |
| <a name="input_types"></a> [types](#input\_types) | A map from cloudevent types to the BigQuery schema associated with them. | `map(string)` | n/a | yes |
| <a name="input_types"></a> [types](#input\_types) | A map from cloudevent types to the BigQuery schema associated with them, as well as an alert threshold and a list of notification channels | <pre>map(object({<br> schema = string<br> alert_threshold = optional(number, 50000)<br> notification_channels = optional(list(string), [])<br> }))</pre> | n/a | yes |

## Outputs

Expand Down
6 changes: 5 additions & 1 deletion modules/cloudevent-recorder/recorder.tf
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,10 @@ module "recorder-dashboard" {
labels = { for type, schema in var.types : replace(type, ".", "_") => "" }

triggers = {
for type, schema in var.types : "type: ${type}" => "${var.name}-${random_id.trigger-suffix[type].hex}"
for type, schema in var.types : "type: ${type}" => {
subscription_prefix = "${var.name}-${random_id.trigger-suffix[type].hex}"
alert_threshold = schema.alert_threshold
notification_channels = schema.notification_channels
}
}
}
9 changes: 7 additions & 2 deletions modules/cloudevent-recorder/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ variable "broker" {
}

variable "types" {
type = map(string)
description = "A map from cloudevent types to the BigQuery schema associated with them."
description = "A map from cloudevent types to the BigQuery schema associated with them, as well as an alert threshold and a list of notification channels"

type = map(object({
schema = string
alert_threshold = optional(number, 50000)
notification_channels = optional(list(string), [])
}))
}
2 changes: 1 addition & 1 deletion modules/dashboard/cloudevent-receiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ No requirements.
|------|-------------|------|---------|:--------:|
| <a name="input_labels"></a> [labels](#input\_labels) | Additional labels to apply to the dashboard. | `map` | `{}` | no |
| <a name="input_service_name"></a> [service\_name](#input\_service\_name) | Name of the service(s) to monitor | `string` | n/a | yes |
| <a name="input_triggers"></a> [triggers](#input\_triggers) | A mapping from a descriptive name to a subscription name prefix. | `map(string)` | n/a | yes |
| <a name="input_triggers"></a> [triggers](#input\_triggers) | A mapping from a descriptive name to a subscription name prefix, an alert threshold, and list of notification channels. | <pre>map(object({<br> subscription_prefix = string<br> alert_threshold = optional(number, 50000)<br> notification_channels = optional(list(string), [])<br> }))</pre> | n/a | yes |

## Outputs

Expand Down
3 changes: 2 additions & 1 deletion modules/dashboard/cloudevent-receiver/dashboard.tf
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ module "subscription" {
source = "../sections/subscription"
title = "Events ${each.key}"

subscription_prefix = each.value
alert_threshold = each.value.alert_threshold
notification_channels = each.value.notification_channels
}

module "logs" {
Expand Down
8 changes: 6 additions & 2 deletions modules/dashboard/cloudevent-receiver/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ variable "labels" {
}

variable "triggers" {
description = "A mapping from a descriptive name to a subscription name prefix."
type = map(string)
description = "A mapping from a descriptive name to a subscription name prefix, an alert threshold, and list of notification channels."
type = map(object({
subscription_prefix = string
alert_threshold = optional(number, 50000)
notification_channels = optional(list(string), [])
}))
}
11 changes: 9 additions & 2 deletions modules/dashboard/sections/subscription/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ No requirements.

## Providers

No providers.
| Name | Version |
|------|---------|
| <a name="provider_google"></a> [google](#provider\_google) | n/a |

## Modules

Expand All @@ -15,17 +17,22 @@ No providers.
| <a name="module_oldest-unacked"></a> [oldest-unacked](#module\_oldest-unacked) | ../../widgets/xy | n/a |
| <a name="module_push-latency"></a> [push-latency](#module\_push-latency) | ../../widgets/latency | n/a |
| <a name="module_received-events"></a> [received-events](#module\_received-events) | ../../widgets/xy | n/a |
| <a name="module_unacked-messages-alert"></a> [unacked-messages-alert](#module\_unacked-messages-alert) | ../../widgets/alert | n/a |
| <a name="module_width"></a> [width](#module\_width) | ../width | n/a |

## Resources

No resources.
| Name | Type |
|------|------|
| [google_monitoring_alert_policy.pubsub_unacked_messages](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy) | resource |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_alert_threshold"></a> [alert\_threshold](#input\_alert\_threshold) | n/a | `number` | `50000` | no |
| <a name="input_collapsed"></a> [collapsed](#input\_collapsed) | n/a | `bool` | `false` | no |
| <a name="input_notification_channels"></a> [notification\_channels](#input\_notification\_channels) | n/a | `list(string)` | `[]` | no |
| <a name="input_subscription_prefix"></a> [subscription\_prefix](#input\_subscription\_prefix) | n/a | `string` | n/a | yes |
| <a name="input_title"></a> [title](#input\_title) | n/a | `string` | n/a | yes |

Expand Down
77 changes: 68 additions & 9 deletions modules/dashboard/sections/subscription/main.tf
Original file line number Diff line number Diff line change
@@ -1,9 +1,59 @@
variable "title" { type = string }
variable "subscription_prefix" { type = string }
variable "collapsed" { default = false }
variable "notification_channels" {
type = list(string)
default = []
}
variable "alert_threshold" {
type = number
default = 50000
}


module "width" { source = "../width" }

resource "google_monitoring_alert_policy" "pubsub_unacked_messages" {
// Close after 7 days
alert_strategy {
auto_close = "604800s"
}

combiner = "OR"

conditions {
condition_threshold {
aggregations {
alignment_period = "300s"
per_series_aligner = "ALIGN_MEAN"
}

comparison = "COMPARISON_GT"
duration = "0s"
filter = "resource.type = \"pubsub_subscription\" AND metric.type = \"pubsub.googleapis.com/subscription/num_unacked_messages_by_region\" AND metadata.system_labels.name = monitoring.regex.full_match(\"${var.subscription_prefix}-.*\")"

trigger {
count = "1"
}

threshold_value = var.alert_threshold
}

display_name = "${var.title}: Unacked messages above ${var.alert_threshold}"
}
display_name = "${var.title}: Unacked messages above ${var.alert_threshold}"

enabled = "true"

notification_channels = var.notification_channels
}

module "unacked-messages-alert" {
source = "../../widgets/alert"
title = google_monitoring_alert_policy.pubsub_unacked_messages.display_name
alert_name = google_monitoring_alert_policy.pubsub_unacked_messages.name
}

module "received-events" {
source = "../../widgets/xy"
title = "Events Pushed"
Expand Down Expand Up @@ -52,27 +102,36 @@ locals {
// N columns, unit width each ([0, unit, 2 * unit, ...])
col = range(0, local.columns * local.unit, local.unit)

tiles = [{
yPos = 0,
xPos = local.col[0],
height = local.unit,
width = local.unit,
widget = module.received-events.widget,
},
tiles = [
{
yPos = 0,
xPos = local.col[0],
height = local.unit,
width = local.width,
widget = module.unacked-messages-alert.widget,
},
{
yPos = local.unit,
xPos = local.col[0],
height = local.unit,
width = local.unit,
widget = module.received-events.widget,
},
{
yPos = local.unit,
xPos = local.col[1],
height = local.unit,
width = local.unit,
widget = module.push-latency.widget,
},
{
yPos = 0,
yPos = local.unit,
xPos = local.col[2],
height = local.unit,
width = local.unit,
widget = module.oldest-unacked.widget,
}]
}
]
}

module "collapsible" {
Expand Down

0 comments on commit c1c46e2

Please sign in to comment.