Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update alarms runbooks #3672

Merged
merged 3 commits into from
Jan 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production Average Response Time above threshold

```{admonition} Metadata
Status: **Unstable**

Maintainer: @krysaldb
Status: **Stable**

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Average+Response+Time+above+threshold?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Average+Response+Time+above+threshold)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production Average Response Time anomaly

```{admonition} Metadata
Status: **Unstable**

Maintainer: @krysaldb
Status: **Stable**

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Average+Response+Time+anomalously+high?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Average+Response+Time+anomalously+high)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production P99 Response Time above threshold

```{admonition} Metadata
Status: **Unstable**

Maintainer: @krysaldb
Status: **Stable**

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+P99+Response+Time+above+threshold?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+P99+Response+Time+above+threshold?)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production P99 Response Time anomaly

```{admonition} Metadata
Status: **Unstable**

Maintainer: @krysaldb
Status: **Stable**

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+P99+Response+Time+anomalously+high?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+P99+Response+Time+anomalously+high)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Run Book: API Production Request Count anomalously high

```{admonition} Metadata
Status: **Unstable**

Maintainer: @krysaldb
Status: **Stable**

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Request+Count+anomalously+high?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/API+Production+Request+Count+anomalously+high)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
- [Production Database + Elasticsearch dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/Service-Overview)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Run Book: Nuxt Production Average Response Time above threshold

```{admonition} Metadata
Status: **Unstable**

Maintainer: @obulat
Status: **Stable**

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Average+Response+Time+above+threshold?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Average+Response+Time+above+threshold)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Run Book: Nuxt Production Average Response Time anomalously high

```{admonition} Metadata
Status: **Unstable**

Maintainer: @obulat
Status: **Stable**

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Average+Response+Time+anomalously+high?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Average+Response+Time+anomalously+high)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Run Book: Nuxt 2XX responses count under threshold

```{admonition} Metadata
Status: **Unstable**

Maintainer: @dhruvkb
Status: **Stable**

Alarm link:
- [production-nuxt](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+HTTP+2XX+responses+count+under+threshold)
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+HTTP+2XX+responses+count+under+threshold)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
```

## Severity guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
# Run Book: Nuxt 5XX responses count above threshold

```{admonition} Metadata
Status: **Unstable**

Maintainer: @dhruvkb
Status: **Stable**

Alarm link:
- [production-nuxt](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+HTTP+5XX+responses+count+over+threshold)
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+HTTP+5XX+responses+count+over+threshold)
- [ECS-Production-Dashboard](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards/dashboard/ECS-Production-Dashboard)
```

## Severity guide
Expand All @@ -25,9 +24,7 @@ errors (this can be checked by observing paths in the Cloudflare logs).
- If the API requests are returning 5XX responses, the severity is high. Further
investigation into the API side is warranted to determine the cause for the
5XX responses. Also refer to the
[API 5XX runbook](/meta/monitoring/runbooks/index.md).

<!-- TODO: Update link to /meta/monitoring/runbooks/api_5xx_above_threshold.md -->
[API 5XX runbook](/meta/monitoring/runbooks/api_http_5xx_above_threshold.md).

## Historical false positives

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@
```{admonition} Metadata
Status: **Disabled** until Nuxt request logging is added.

Maintainer: @obulat

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+P99+Response+Time+above+threshold?>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+P99+Response+Time+above+threshold)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@
```{admonition} Metadata
Status: **Disabled** until Nuxt request logging is added.

Maintainer: @obulat

Alarm link:
- <https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+P99+Response+Time+anomalously+high>
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+P99+Response+Time+anomalously+high>)
```

## Severity Guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
# Run Book: Nuxt Request Count anomalously high

```{admonition} Metadata
Status: **Unstable**

Maintainer: @dhruvkb
Status: **Stable**

Alarm link:
- [production-nuxt](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Request+Count+anomalously+high)
- [Alarm details](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Request+Count+anomalously+high)
```

## Severity guide
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Run Book: Unhealthy hosts for ECS service

```{admonition} Metadata
Status: **Unstable**
Status: **Stable**


Alarm links:
Expand Down