Terraform module that configures the recommended Amazon Elasticache Memcached Alarms using CloudWatch and sends alerts to an SNS topic.
Note: This can ALSO be used for Redis, and can be used per-node on Redis. See example below.
This module requires > v0.12
Terraform
area | metric | op | threshold | rationale |
---|---|---|---|---|
CPU | CPUUtilization | > |
90 % | This metric can be as high as 90%. If you exceed this threshold, scale your cache cluster up horizontally or vertically. |
Memory | SwapUsage | > |
50 MB | If this ever uses swap, it means you need to scale up vertically or adjust the ConnectionOverhead parameter value. |
Memory | Evictions | > |
10 | Evictions should generally never happen, or happen rarely. You may need to adjust this alarm for your usage pattern. |
Memory | FreeableMemory | < |
200 MB | If we have low memory available, it means we need to scale up vertically usually. |
Usage | CurrConnections | > |
anomaly | This detects odd connection count patterns (anomaly detection). |
For more information please see recommended Amazon Elasticache Memcached Alarms.
# Simple usage example
module "elasticache_alarms" {
source = "github.com/DevOps-Nirvana/terraform-aws-elasticache-memcached-alarms?ref=main"
# Our cache cluster name (todo: manage in TF instead of manual)
cache_cluster_id = "TestCluster"
# A list of actions to take when alarms are triggered
sns_topic_alarm_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
# A list of actions to take when alarms are cleared
sns_topic_ok_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
# Set our standard tags
tags = {
Cluster = "TestCluster"
}
}
# Redis HA usage example (alarms per-node)
module "elasticache_alarms" {
source = "github.com/DevOps-Nirvana/terraform-aws-elasticache-memcached-alarms?ref=main"
# Our cache cluster name (todo: manage in TF instead of manual)
cache_cluster_id = "TestCluster"
# To do node-based alarms instead of grouped alarms you MUST specify the following three items...
count = 4 # This is how many nodes total (this count of 4 means 1 master and 3 extra nodes)
# This makes the alarm dimension specific on the individual node
dimensions = { CacheNodeId = format("TestCluster-%04s", count.index) }
# This makes the alarms all have a different name based on the node name
suffix = "-${count.index}"
# A list of actions to take when alarms are triggered
sns_topic_alarm_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
# A list of actions to take when alarms are cleared
sns_topic_ok_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
# Set our standard tags
tags = {
Cluster = "TestCluster"
}
}
You can also customize various parts of this module, all possible options are listed here, and specified below.
module "elasticache_alarms" {
source = "github.com/DevOps-Nirvana/terraform-aws-elasticache-memcached-alarms?ref=main"
# Add a prefix to all alarms
prefix = "myprefix-"
# Our cache cluster name (todo: manage in TF instead of manual)
cache_cluster_id = "TestCluster"
# We want to customize the CPU alarm threshold
cpu_percent_threshold = 50
# We want to customize the SWAP alarm threshold (in bytes)
swap_threshold = 256 * 1024 * 1024 # 256MB
# We want to customize the current connection anomaly detection
monitor_connection_anomalies = true
anomaly_period = 300
anomaly_evaluation_periods = 6
anomaly_band_width = 4
# (disabled by default) if we want to enable an alarm on max connections
monitor_connection_maximum = 50
# A list of actions to take when alarms are triggered
sns_topic_alarm_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
# A list of actions to take when alarms are cleared
sns_topic_ok_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
# Set our standard tags
tags = {
Cluster = "TestCluster"
}
}
Name | Description | Type | Default | Required |
---|---|---|---|---|
cache_cluster_id |
The Elasticache Cluster ID you want to monitor. | string | - | yes |
prefix |
A prefix added to all alarm names | string | "" | no |
suffix |
A suffix added to all alarm names, use this for Redis alarms per-node | string | "" | no |
sns_topic_alarm_arns |
An list of ARNs to trigger on alarm | list | [] | no (but recommended) |
sns_topic_ok_arns |
An list of ARNs to trigger on ok (alarm finished) | list | [] | no |
tags |
An map of the typical tags to set on every alarm | map | {} | no |
dimensions |
A way to add extra dimensions to the alarms (eg: for Redis single-node alarms) | map | {} | no |
cpu_percent_threshold |
The high-percent threshold at which we alarm on CPU usage | number | 90 |
no |
swap_threshold |
The high-bytes threshold at which we alarm on swap usage (default 50MB) | number | 52428800 |
no |
evictions_threshold |
The high-usage threshold at which we alarm on evictions | number | 0 |
no |
freeable_memory_minimum |
The low-bytes threshold at which we alarm on free memory (default 200MB) | number | 209715200 |
no |
freeable_memory_minimum |
The low-bytes threshold at which we alarm on free memory (default 200MB) | number | 209715200 |
no |
monitor_connection_anomalies |
A flag to enable or disable monitoring connection count anomalies | bool | true |
no |
anomaly_period |
The number of seconds that make each evaluation period for anomaly detection | number | 600 |
no |
anomaly_evaluation_periods |
The amount of periods over which to use when triggering alarms | number | 3 |
no |
anomaly_band_width |
The width of the anomaly band, default 2. Higher numbers means less sensitive | number | 2 |
no |
monitor_connection_maximum |
If you wish to alarm on maximum connections then set this to > 0 | number | 0 |
no |
None
Please give it a ★ GitHub or share it with others.
File a GitHub issue for problems or feature requests.
Using MIT License