Skip to content

devopscaptain/aws-monitoring-solution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—οΈ CloudFormation Deployment Guide - EC2 Monitoring System

πŸ“‹ Overview

This guide covers the CloudFormation infrastructure deployment for the automated EC2 monitoring system with enhanced notifications.

πŸš€ Quick Deployment

One-Command Deployment

./deploy.sh

Manual Deployment

# 1. Create deployment package
cd lambda_functions
zip -r ../lambda-deployment.zip . -x "*.pyc" "__pycache__/*"
cd ..

# 2. Deploy CloudFormation stack
aws cloudformation create-stack \
  --stack-name ec2-monitoring-infrastructure \
  --template-body file://cloudformation/monitoring-infrastructure.yaml \
  --parameters ParameterKey=NotificationEmail,ParameterValue=your-email@example.com \
  --capabilities CAPABILITY_NAMED_IAM \
  --region us-east-1

# 3. Wait for completion
aws cloudformation wait stack-create-complete \
  --stack-name ec2-monitoring-infrastructure \
  --region us-east-1

# 4. Update Lambda code
aws lambda update-function-code \
  --function-name ec2-monitoring-handler \
  --zip-file fileb://lambda-deployment.zip \
  --region us-east-1

πŸ“Š Monitoring Specifications

🚨 Alarm Thresholds

Metric Threshold Period Evaluations Actions
CPU Utilization 80% 5 min 2 consecutive SNS Alert
Network In/Out 1 GB 5 min 2 consecutive SNS Alert
Status Check 1 failure 5 min 2 consecutive SNS Alert
CPU Credits (t2/t3/t4g) 10 credits 5 min 2 consecutive SNS Alert

πŸ“ˆ Auto-Created Alarms

  • Standard Instances: 4 alarms (CPU, Network In/Out, Status)
  • Burstable Instances: 5 alarms (+ CPU Credits)
  • Naming: EC2-{instance-id}-{metric-name}
  • Actions: All alarms send SNS notifications

🎯 Dashboard Features

  • Auto-Updates: New instances add widgets automatically
  • Real-Time: 1-hour time range for fast loading
  • Organized Layout: Grouped by instance type
  • Direct Links: Included in email notifications

πŸ—οΈ CloudFormation Components

πŸ“§ SNS Topic

MonitoringNotificationTopic:
  Type: AWS::SNS::Topic
  Properties:
    TopicName: !Sub 'ec2-monitoring-notifications-${AWS::StackName}'
    DisplayName: EC2 Monitoring Notifications

Purpose: Sends enhanced email notifications with emojis and CloudWatch URLs

πŸ“¬ SNS Email Subscription

MonitoringEmailSubscription:
  Type: AWS::SNS::Subscription
  Properties:
    Protocol: email
    TopicArn: !Ref MonitoringNotificationTopic
    Endpoint: !Ref NotificationEmail

Purpose: Configures email delivery for notifications

πŸ” IAM Execution Role

LambdaExecutionRole:
  Type: AWS::IAM::Role
  Properties:
    RoleName: !Sub '${LambdaFunctionName}-execution-role'
    AssumeRolePolicyDocument: # Lambda service trust policy
    Policies:
      - PolicyName: MonitoringPermissions
        PolicyDocument: # EC2, CloudWatch, SNS permissions

Permissions:

  • ec2:DescribeInstances - Get instance details for notifications
  • cloudwatch:*Alarm* - Create/delete/manage alarms
  • cloudwatch:*Dashboard* - Update dashboard widgets
  • sns:Publish - Send notifications
  • logs:* - CloudWatch Logs access

⚑ Lambda Function

MonitoringLambdaFunction:
  Type: AWS::Lambda::Function
  Properties:
    FunctionName: !Ref LambdaFunctionName
    Runtime: python3.9
    Handler: index.lambda_handler
    Timeout: 300
    MemorySize: 128
    Environment:
      Variables:
        SNS_TOPIC_ARN: !Ref MonitoringNotificationTopic
        DASHBOARD_NAME: !Sub 'EC2-Monitoring-Dashboard-${AWS::StackName}'

Features:

  • Enhanced notifications with emojis and EC2 details
  • CloudWatch dashboard URLs in notifications
  • Automatic alarm creation and cleanup
  • Instance-specific monitoring based on type

🎯 EventBridge Rule

EC2StateChangeRule:
  Type: AWS::Events::Rule
  Properties:
    Name: !Sub 'ec2-state-change-monitoring-${AWS::StackName}'
    EventPattern:
      source: ["aws.ec2"]
      detail-type: ["EC2 Instance State-change Notification"]
      detail:
        state: ["running", "terminated"]
    Targets:
      - Arn: !GetAtt MonitoringLambdaFunction.Arn
        Id: MonitoringTarget

Triggers: EC2 instance launch and termination events

πŸ”‘ Lambda Permission

LambdaInvokePermission:
  Type: AWS::Lambda::Permission
  Properties:
    FunctionName: !Ref MonitoringLambdaFunction
    Action: lambda:InvokeFunction
    Principal: events.amazonaws.com
    SourceArn: !GetAtt EC2StateChangeRule.Arn

Purpose: Allows EventBridge to invoke the Lambda function

πŸ“Š Monitoring Details & Thresholds

🚨 Alarm Configurations

All EC2 Instances

Metric Threshold Period Evaluation Periods Comparison Actions
CPU Utilization 80% 5 minutes 2 consecutive GreaterThanThreshold SNS Alert
Network In 1 GB 5 minutes 2 consecutive GreaterThanThreshold SNS Alert
Network Out 1 GB 5 minutes 2 consecutive GreaterThanThreshold SNS Alert
Status Check Failed 1 failure 5 minutes 2 consecutive GreaterThanOrEqualToThreshold SNS Alert

Burstable Instances (t2, t3, t4g)

Metric Threshold Period Evaluation Periods Comparison Actions
CPU Credit Balance 10 credits 5 minutes 2 consecutive LessThanThreshold SNS Alert

πŸ“ˆ Monitoring Capabilities

Automatic Alarm Creation

# CPU Utilization Alarm
{
    'AlarmName': 'EC2-{instance_id}-HighCPUUtilization',
    'MetricName': 'CPUUtilization',
    'Namespace': 'AWS/EC2',
    'Statistic': 'Average',
    'Period': 300,
    'EvaluationPeriods': 2,
    'Threshold': 80.0,
    'ComparisonOperator': 'GreaterThanThreshold',
    'AlarmActions': [SNS_TOPIC_ARN]
}

# Network Traffic Alarms
{
    'AlarmName': 'EC2-{instance_id}-NetworkIn',
    'MetricName': 'NetworkIn',
    'Threshold': 1000000000.0,  # 1GB in bytes
    'Unit': 'Bytes'
}

# Status Check Alarm
{
    'AlarmName': 'EC2-{instance_id}-StatusCheckFailed',
    'MetricName': 'StatusCheckFailed_Instance',
    'Statistic': 'Maximum',
    'Threshold': 1.0,
    'ComparisonOperator': 'GreaterThanOrEqualToThreshold'
}

Instance-Type Specific Monitoring

  • Standard Instances (m5, c5, r5, etc.): 4 base alarms
  • Burstable Instances (t2, t3, t4g): 5 alarms (includes CPU credits)
  • Memory-Optimized: Additional memory monitoring (if CloudWatch agent installed)
  • Storage-Optimized: Additional EBS volume monitoring

πŸ“Š Dashboard Widgets

Auto-Created Widgets

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "metrics": [
          ["AWS/EC2", "CPUUtilization", "InstanceId", "{instance_id}"]
        ],
        "period": 300,
        "stat": "Average",
        "region": "us-east-1",
        "title": "CPU Utilization - {instance_name}"
      }
    },
    {
      "type": "metric",
      "properties": {
        "metrics": [
          ["AWS/EC2", "NetworkIn", "InstanceId", "{instance_id}"],
          [".", "NetworkOut", ".", "."]
        ],
        "period": 300,
        "stat": "Sum",
        "title": "Network Traffic - {instance_name}"
      }
    }
  ]
}

🎯 Notification Details

Enhanced Email Notifications

πŸš€ EC2 INSTANCE STARTUP DETECTED πŸŽ‰

πŸ“‹ INSTANCE DETAILS:
🏷️  Instance Name: MyWebServer
πŸ†”  Instance ID: i-1234567890abcdef0
πŸ’»  Instance Type: t3.micro
🌍  Availability Zone: us-east-1a
πŸ”’  Private IP: 10.0.1.100
🌍  Public IP: 54.123.45.67

🚨 ALARMS CREATED (5 total):
   🚨 1. EC2-i-1234567890abcdef0-HighCPUUtilization
   πŸ“Š 2. EC2-i-1234567890abcdef0-NetworkIn
   πŸ“Š 3. EC2-i-1234567890abcdef0-NetworkOut
   ❀️ 4. EC2-i-1234567890abcdef0-StatusCheckFailed
   ⚑ 5. EC2-i-1234567890abcdef0-LowCPUCreditBalance

πŸ”— QUICK ACCESS LINKS:
πŸ“Š  Dashboard: https://console.aws.amazon.com/cloudwatch/...
🚨  Alarms: https://console.aws.amazon.com/cloudwatch/...

πŸ“Š Stack Outputs

Outputs:
  LambdaFunctionArn:
    Description: ARN of the Lambda function
    Value: !GetAtt MonitoringLambdaFunction.Arn
    Export:
      Name: !Sub '${AWS::StackName}-LambdaArn'

  SNSTopicArn:
    Description: ARN of the SNS topic for notifications
    Value: !Ref MonitoringNotificationTopic
    Export:
      Name: !Sub '${AWS::StackName}-SNSTopicArn'

  EventRuleArn:
    Description: ARN of the EventBridge rule
    Value: !GetAtt EC2StateChangeRule.Arn
    Export:
      Name: !Sub '${AWS::StackName}-EventRuleArn'

  DashboardURL:
    Description: URL to the CloudWatch dashboard
    Value: !Sub 'https://${AWS::Region}.console.aws.amazon.com/cloudwatch/home?region=${AWS::Region}#dashboards:name=EC2-Monitoring-Dashboard'

βš™οΈ Parameters

Required Parameters

Parameters:
  NotificationEmail:
    Type: String
    Description: Email address for monitoring notifications
    # No default - must be provided

  LambdaFunctionName:
    Type: String
    Description: Name for the Lambda function
    Default: ec2-monitoring-handler

Parameter Usage

# Deploy with custom parameters
aws cloudformation create-stack \
  --stack-name my-monitoring-stack \
  --template-body file://cloudformation/monitoring-infrastructure.yaml \
  --parameters \
    ParameterKey=NotificationEmail,ParameterValue=admin@company.com \
    ParameterKey=LambdaFunctionName,ParameterValue=custom-monitoring-handler \
  --capabilities CAPABILITY_NAMED_IAM

πŸ” Deployment Verification

Check Stack Status

aws cloudformation describe-stacks \
  --stack-name ec2-monitoring-infrastructure \
  --query 'Stacks[0].StackStatus'

Verify Resources

# Lambda function
aws lambda get-function \
  --function-name ec2-monitoring-handler \
  --query 'Configuration.State'

# EventBridge rule
aws events describe-rule \
  --name ec2-state-change-monitoring-ec2-monitoring-infrastructure \
  --query 'State'

# SNS topic
aws sns list-topics \
  --query 'Topics[?contains(TopicArn, `ec2-monitoring-notifications`)]'

Test Deployment

# Create test instance
aws ec2 run-instances \
  --image-id ami-0c02fb55956c7d316 \
  --count 1 \
  --instance-type t3.micro \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=Test-Monitoring}]'

# Check created alarms (wait 30 seconds)
aws cloudwatch describe-alarms \
  --alarm-name-prefix "EC2-" \
  --query 'MetricAlarms[].AlarmName'

πŸ› οΈ Troubleshooting

Common Issues

Stack Creation Failed

# Check stack events
aws cloudformation describe-stack-events \
  --stack-name ec2-monitoring-infrastructure \
  --query 'StackEvents[?ResourceStatus==`CREATE_FAILED`]'

Lambda Function Not Working

# Check Lambda logs
aws logs describe-log-streams \
  --log-group-name "/aws/lambda/ec2-monitoring-handler" \
  --order-by LastEventTime --descending

# Get recent log events
aws logs get-log-events \
  --log-group-name "/aws/lambda/ec2-monitoring-handler" \
  --log-stream-name "LATEST_STREAM_NAME"

Email Notifications Not Received

# Check SNS subscription status
aws sns list-subscriptions-by-topic \
  --topic-arn "SNS_TOPIC_ARN"

# Confirm subscription in email
# Check spam/junk folder

Debug Commands

# Test Lambda function directly
aws lambda invoke \
  --function-name ec2-monitoring-handler \
  --payload '{"detail":{"instance-id":"i-test","state":"running"}}' \
  response.json

# Check EventBridge rule targets
aws events list-targets-by-rule \
  --rule ec2-state-change-monitoring-ec2-monitoring-infrastructure

🧹 Stack Cleanup

Delete Stack

# Delete CloudFormation stack
aws cloudformation delete-stack \
  --stack-name ec2-monitoring-infrastructure

# Wait for deletion
aws cloudformation wait stack-delete-complete \
  --stack-name ec2-monitoring-infrastructure

Manual Cleanup (if needed)

# Delete remaining alarms
aws cloudwatch delete-alarms \
  --alarm-names $(aws cloudwatch describe-alarms \
    --query 'MetricAlarms[?starts_with(AlarmName, `EC2-`)].AlarmName' \
    --output text)

# Delete dashboard
aws cloudwatch delete-dashboards \
  --dashboard-names "EC2-Monitoring-Dashboard-ec2-monitoring-infrastructure"

πŸ“Š Resource Costs

Monthly Cost Estimate

  • Lambda Function: ~$0.20 per 1M executions
  • CloudWatch Alarms: $0.10 per alarm per month
  • SNS Notifications: $0.50 per 1M notifications
  • EventBridge Rules: $1.00 per 1M events
  • CloudWatch Dashboard: $3.00 per dashboard per month

Typical Usage (10 instances)

  • Alarms: 50 alarms Γ— $0.10 = $5.00/month
  • Dashboard: 1 dashboard = $3.00/month
  • Lambda + SNS + EventBridge: ~$1.00/month
  • Total: ~$9.00/month

πŸ”’ Security Considerations

IAM Best Practices

  • Least privilege permissions
  • Resource-specific ARNs where possible
  • No wildcard permissions for sensitive actions

Data Protection

  • SNS messages contain instance metadata only
  • No sensitive data in CloudWatch logs
  • Encrypted CloudWatch Logs (optional)

πŸ“ˆ Scaling Considerations

Multi-Region Deployment

# Deploy to multiple regions
for region in us-east-1 us-west-2 eu-west-1; do
  aws cloudformation create-stack \
    --stack-name ec2-monitoring-infrastructure \
    --template-body file://cloudformation/monitoring-infrastructure.yaml \
    --parameters ParameterKey=NotificationEmail,ParameterValue=admin@company.com \
    --capabilities CAPABILITY_NAMED_IAM \
    --region $region
done

Large-Scale Deployments

  • Consider Lambda concurrency limits
  • Monitor CloudWatch API throttling
  • Use multiple SNS topics for different teams

🎯 Quick Reference

Deploy: ./deploy.sh
Test: Launch any EC2 instance
Monitor: Check CloudWatch dashboard
Cleanup: aws cloudformation delete-stack --stack-name ec2-monitoring-infrastructure

Stack Status: aws cloudformation describe-stacks --stack-name ec2-monitoring-infrastructure --query 'Stacks[0].StackStatus'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published