Skip to content

Standardized best-practice recommended alarms for AWS Elasticache Memcached

License

Notifications You must be signed in to change notification settings

DevOps-Nirvana/terraform-aws-elasticache-memcached-alarms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Terraform AWS Elasticache Memcached Alerts

Terraform module that configures the recommended Amazon Elasticache Memcached Alarms using CloudWatch and sends alerts to an SNS topic.

Note: This can ALSO be used for Redis, and can be used per-node on Redis. See example below.

This module requires > v0.12 Terraform

Metrics and Alarms

area metric op threshold rationale
CPU CPUUtilization > 90 % This metric can be as high as 90%. If you exceed this threshold, scale your cache cluster up horizontally or vertically.
Memory SwapUsage > 50 MB If this ever uses swap, it means you need to scale up vertically or adjust the ConnectionOverhead parameter value.
Memory Evictions > 10 Evictions should generally never happen, or happen rarely. You may need to adjust this alarm for your usage pattern.
Memory FreeableMemory < 200 MB If we have low memory available, it means we need to scale up vertically usually.
Usage CurrConnections > anomaly This detects odd connection count patterns (anomaly detection).

For more information please see recommended Amazon Elasticache Memcached Alarms.

Examples

# Simple usage example
module "elasticache_alarms" {
  source  = "github.com/DevOps-Nirvana/terraform-aws-elasticache-memcached-alarms?ref=main"
  
  # Our cache cluster name (todo: manage in TF instead of manual)
  cache_cluster_id = "TestCluster"
  
  # A list of actions to take when alarms are triggered
  sns_topic_alarm_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
  # A list of actions to take when alarms are cleared
  sns_topic_ok_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
  
  # Set our standard tags
  tags = {
    Cluster = "TestCluster"
  }
}
# Redis HA usage example (alarms per-node)
module "elasticache_alarms" {
  source  = "github.com/DevOps-Nirvana/terraform-aws-elasticache-memcached-alarms?ref=main"
  
  # Our cache cluster name (todo: manage in TF instead of manual)
  cache_cluster_id = "TestCluster"
  
  # To do node-based alarms instead of grouped alarms you MUST specify the following three items...
  count = 4  # This is how many nodes total (this count of 4 means 1 master and  3 extra nodes)
  # This makes the alarm dimension specific on the individual node
  dimensions = { CacheNodeId = format("TestCluster-%04s", count.index) }
  # This makes the alarms all have a different name based on the node name
  suffix = "-${count.index}"

  # A list of actions to take when alarms are triggered
  sns_topic_alarm_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
  # A list of actions to take when alarms are cleared
  sns_topic_ok_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
  
  # Set our standard tags
  tags = {
    Cluster = "TestCluster"
  }
}

You can also customize various parts of this module, all possible options are listed here, and specified below.

module "elasticache_alarms" {
  source = "github.com/DevOps-Nirvana/terraform-aws-elasticache-memcached-alarms?ref=main"
  
  # Add a prefix to all alarms
  prefix = "myprefix-"
  
  # Our cache cluster name (todo: manage in TF instead of manual)
  cache_cluster_id = "TestCluster"
  
  # We want to customize the CPU alarm threshold
  cpu_percent_threshold = 50
  # We want to customize the SWAP alarm threshold (in bytes)
  swap_threshold = 256 * 1024 * 1024  # 256MB
  # We want to customize the current connection anomaly detection
  monitor_connection_anomalies = true
  anomaly_period = 300
  anomaly_evaluation_periods = 6
  anomaly_band_width = 4
  # (disabled by default) if we want to enable an alarm on max connections
  monitor_connection_maximum = 50
  
  # A list of actions to take when alarms are triggered
  sns_topic_alarm_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
  # A list of actions to take when alarms are cleared
  sns_topic_ok_arns = ["arn:aws:sns:us-east-1:123123123123:sns-to-slack"]
  
  # Set our standard tags
  tags = {
    Cluster = "TestCluster"
  }
}

Inputs

Name Description Type Default Required
cache_cluster_id The Elasticache Cluster ID you want to monitor. string - yes
prefix A prefix added to all alarm names string "" no
suffix A suffix added to all alarm names, use this for Redis alarms per-node string "" no
sns_topic_alarm_arns An list of ARNs to trigger on alarm list [] no (but recommended)
sns_topic_ok_arns An list of ARNs to trigger on ok (alarm finished) list [] no
tags An map of the typical tags to set on every alarm map {} no
dimensions A way to add extra dimensions to the alarms (eg: for Redis single-node alarms) map {} no
cpu_percent_threshold The high-percent threshold at which we alarm on CPU usage number 90 no
swap_threshold The high-bytes threshold at which we alarm on swap usage (default 50MB) number 52428800 no
evictions_threshold The high-usage threshold at which we alarm on evictions number 0 no
freeable_memory_minimum The low-bytes threshold at which we alarm on free memory (default 200MB) number 209715200 no
freeable_memory_minimum The low-bytes threshold at which we alarm on free memory (default 200MB) number 209715200 no
monitor_connection_anomalies A flag to enable or disable monitoring connection count anomalies bool true no
anomaly_period The number of seconds that make each evaluation period for anomaly detection number 600 no
anomaly_evaluation_periods The amount of periods over which to use when triggering alarms number 3 no
anomaly_band_width The width of the anomaly band, default 2. Higher numbers means less sensitive number 2 no
monitor_connection_maximum If you wish to alarm on maximum connections then set this to > 0 number 0 no

Outputs

None

Share the Love

Please give it a ★ GitHub or share it with others.

Help

File a GitHub issue for problems or feature requests.

License

Using MIT License

Releases

No releases published

Packages

No packages published

Languages