diff --git a/documentation/meta/monitoring/runbooks/api_request_count_above_threshold.md b/documentation/meta/monitoring/runbooks/api_request_count_anomaly.md similarity index 70% rename from documentation/meta/monitoring/runbooks/api_request_count_above_threshold.md rename to documentation/meta/monitoring/runbooks/api_request_count_anomaly.md index 332c7afafb6..4a2106103e6 100644 --- a/documentation/meta/monitoring/runbooks/api_request_count_above_threshold.md +++ b/documentation/meta/monitoring/runbooks/api_request_count_anomaly.md @@ -1,4 +1,4 @@ -# Run Book: API Production Request Count above threshold +# Run Book: API Production Request Count anomalously high ```{admonition} Metadata Status: **Unstable** @@ -6,7 +6,7 @@ Status: **Unstable** Maintainer: @krysaldb Alarm link: -- +- ``` ## Severity Guide @@ -19,10 +19,10 @@ future resource scaling depending on the kind of traffic. If the services are strained then the severity is critical, search for the root cause to prevent more serious outages. If there are no recent obvious integrations (like the Gutenberg plugin) then follow the run book to [identify -traffic anomalies in Cloudflare][runbook_traffic], to determine whether the -recent traffic is organic or if it comes from a botnet. Find the origin of -requests and evaluate whether it needs to be blocked or if Openverse services -need to adapt to the new demand. +traffic anomalies][runbook_traffic], to determine whether the recent traffic is +organic or if it comes from a botnet. Find the origin of requests and evaluate +whether it needs to be blocked or if Openverse services need to adapt to the new +demand. [runbook_traffic]: https://docs.openverse.org/meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.html diff --git a/documentation/meta/monitoring/runbooks/api_thumbnails_avg_response_time_above_threshold.md b/documentation/meta/monitoring/runbooks/api_thumbnails_avg_response_time_above_threshold.md index eada9a9bf13..4cd49639792 100644 --- a/documentation/meta/monitoring/runbooks/api_thumbnails_avg_response_time_above_threshold.md +++ b/documentation/meta/monitoring/runbooks/api_thumbnails_avg_response_time_above_threshold.md @@ -2,7 +2,9 @@ ```{admonition} Metadata Status: **Unstable** + Maintainer: @stacimc + Alarm link: - ``` diff --git a/documentation/meta/monitoring/runbooks/api_thumbnails_avg_response_time_anomaly.md b/documentation/meta/monitoring/runbooks/api_thumbnails_avg_response_time_anomaly.md index 4ab28f149a0..8e6602b0b80 100644 --- a/documentation/meta/monitoring/runbooks/api_thumbnails_avg_response_time_anomaly.md +++ b/documentation/meta/monitoring/runbooks/api_thumbnails_avg_response_time_anomaly.md @@ -2,7 +2,9 @@ ```{admonition} Metadata Status: **Unstable** + Maintainer: @stacimc + Alarm link: - ``` diff --git a/documentation/meta/monitoring/runbooks/api_thumbnails_p99_response_time_above_threshold.md b/documentation/meta/monitoring/runbooks/api_thumbnails_p99_response_time_above_threshold.md index 31a7eab1acf..0cf613d562a 100644 --- a/documentation/meta/monitoring/runbooks/api_thumbnails_p99_response_time_above_threshold.md +++ b/documentation/meta/monitoring/runbooks/api_thumbnails_p99_response_time_above_threshold.md @@ -2,7 +2,9 @@ ```{admonition} Metadata Status: **Unstable** + Maintainer: @stacimc + Alarm link: - ``` diff --git a/documentation/meta/monitoring/runbooks/api_thumbnails_p99_response_time_anomaly.md b/documentation/meta/monitoring/runbooks/api_thumbnails_p99_response_time_anomaly.md index 65ca50bdc4e..440af3e7a52 100644 --- a/documentation/meta/monitoring/runbooks/api_thumbnails_p99_response_time_anomaly.md +++ b/documentation/meta/monitoring/runbooks/api_thumbnails_p99_response_time_anomaly.md @@ -2,7 +2,9 @@ ```{admonition} Metadata Status: **Unstable** + Maintainer: @stacimc + Alarm link: - ``` diff --git a/documentation/meta/monitoring/runbooks/api_thumbnails_request_count_anomaly.md b/documentation/meta/monitoring/runbooks/api_thumbnails_request_count_anomaly.md index a80595f87c6..255c43db232 100644 --- a/documentation/meta/monitoring/runbooks/api_thumbnails_request_count_anomaly.md +++ b/documentation/meta/monitoring/runbooks/api_thumbnails_request_count_anomaly.md @@ -2,7 +2,9 @@ ```{admonition} Metadata Status: **Unstable** + Maintainer: @krysaldb + Alarm link: - ``` diff --git a/documentation/meta/monitoring/runbooks/index.md b/documentation/meta/monitoring/runbooks/index.md index 2145a9dd494..13b661174a3 100644 --- a/documentation/meta/monitoring/runbooks/index.md +++ b/documentation/meta/monitoring/runbooks/index.md @@ -12,13 +12,13 @@ that can be a good resource when writing a new one. ```{toctree} :titlesonly: -api_request_count_above_threshold api_http_2xx_under_threshold api_http_5xx_above_threshold api_avg_response_time_above_threshold api_avg_response_time_anomaly api_p99_response_time_above_threshold api_p99_response_time_anomaly +api_request_count_anomaly api_thumbnails_http_2xx_under_threshold api_thumbnails_http_5xx_above_threshold api_thumbnails_request_count_anomaly @@ -26,10 +26,12 @@ api_thumbnails_avg_response_time_above_threshold api_thumbnails_avg_response_time_anomaly api_thumbnails_p99_response_time_above_threshold api_thumbnails_p99_response_time_anomaly -nuxt_request_count -nuxt_2xx_under_threshold -nuxt_5xx_above_threshold +nuxt_http_2xx_under_threshold +nuxt_http_5xx_above_threshold nuxt_avg_response_time_above_threshold +nuxt_avg_response_time_anomaly nuxt_p99_response_time_above_threshold +nuxt_p99_response_time_anomaly +nuxt_request_count_anomaly unhealthy_ecs_hosts ``` diff --git a/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md b/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md index c3874021ef0..538abb6872a 100644 --- a/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md +++ b/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_above_threshold.md @@ -2,7 +2,9 @@ ```{admonition} Metadata Status: **Unstable** + Maintainer: @obulat + Alarm link: - ``` diff --git a/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_anomaly.md b/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_anomaly.md new file mode 100644 index 00000000000..338a51f5577 --- /dev/null +++ b/documentation/meta/monitoring/runbooks/nuxt_avg_response_time_anomaly.md @@ -0,0 +1,31 @@ +# Run Book: Nuxt Production Average Response Time anomalously high + +```{admonition} Metadata +Status: **Unstable** + +Maintainer: @obulat + +Alarm link: +- +``` + +## Severity Guide + +Confirm that there is not a total outage of the service. If not, the severity is +likely low. Check for the request count and general network activity. If +abnormally high, refer to the [traffic analysis run book][traffic_runbook] to +identify and block any malicious traffic. If not, then check for a recent +deployment that may have introduced a problem, and [rollback][rollback_docs] to +the previous version if necessary. + +[traffic_runbook]: + /meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.md +[rollback_docs]: /general/deployment.md#rollbacks + +## Historical false positives + +Nothing registered to date. + +## Related incident reports + +Nothing registered to date. diff --git a/documentation/meta/monitoring/runbooks/nuxt_2xx_under_threshold.md b/documentation/meta/monitoring/runbooks/nuxt_http_2xx_under_threshold.md similarity index 95% rename from documentation/meta/monitoring/runbooks/nuxt_2xx_under_threshold.md rename to documentation/meta/monitoring/runbooks/nuxt_http_2xx_under_threshold.md index 1f7d1d97ea3..6fbb61e49e6 100644 --- a/documentation/meta/monitoring/runbooks/nuxt_2xx_under_threshold.md +++ b/documentation/meta/monitoring/runbooks/nuxt_http_2xx_under_threshold.md @@ -1,4 +1,4 @@ -# Run Book: Nuxt 2XX request count under threshold +# Run Book: Nuxt 2XX responses count under threshold ```{admonition} Metadata Status: **Unstable** diff --git a/documentation/meta/monitoring/runbooks/nuxt_5xx_above_threshold.md b/documentation/meta/monitoring/runbooks/nuxt_http_5xx_above_threshold.md similarity index 96% rename from documentation/meta/monitoring/runbooks/nuxt_5xx_above_threshold.md rename to documentation/meta/monitoring/runbooks/nuxt_http_5xx_above_threshold.md index 7d3705ac96f..542862d1e9f 100644 --- a/documentation/meta/monitoring/runbooks/nuxt_5xx_above_threshold.md +++ b/documentation/meta/monitoring/runbooks/nuxt_http_5xx_above_threshold.md @@ -1,4 +1,4 @@ -# Run Book: Nuxt 5XX request count above threshold +# Run Book: Nuxt 5XX responses count above threshold ```{admonition} Metadata Status: **Unstable** diff --git a/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md b/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md index dea1ea3d4ab..f9a0bf511d0 100644 --- a/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md +++ b/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_above_threshold.md @@ -1,8 +1,10 @@ -# Run Book: Nuxt Production Average Response Time above threshold +# Run Book: Nuxt Production P99 Response Time above threshold ```{admonition} Metadata -Status: **Unstable** +Status: **Disabled** until Nuxt request logging is added. + Maintainer: @obulat + Alarm link: - ``` diff --git a/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_anomaly.md b/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_anomaly.md new file mode 100644 index 00000000000..9c76aa430cb --- /dev/null +++ b/documentation/meta/monitoring/runbooks/nuxt_p99_response_time_anomaly.md @@ -0,0 +1,31 @@ +# Run Book: Nuxt Production P99 Response Time anomalously high + +```{admonition} Metadata +Status: **Disabled** until Nuxt request logging is added. + +Maintainer: @obulat + +Alarm link: +- +``` + +## Severity Guide + +Confirm that there is not a total outage of the service. If not, the severity is +likely low. Check for the request count and general network activity. If +abnormally high, refer to the [traffic analysis run book][traffic_runbook] to +identify and block any malicious traffic. If not, then check for a recent +deployment that may have introduced a problem, and [rollback][rollback_docs] to +the previous version if necessary. + +[traffic_runbook]: + /meta/monitoring/traffic/runbooks/identifying-and-blocking-traffic-anomalies.md +[rollback_docs]: /general/deployment.md#rollbacks + +## Historical false positives + +Nothing registered to date. + +## Related incident reports + +Nothing registered to date. diff --git a/documentation/meta/monitoring/runbooks/nuxt_request_count.md b/documentation/meta/monitoring/runbooks/nuxt_request_count_anomaly.md similarity index 89% rename from documentation/meta/monitoring/runbooks/nuxt_request_count.md rename to documentation/meta/monitoring/runbooks/nuxt_request_count_anomaly.md index 0573055fa12..359f429dcb5 100644 --- a/documentation/meta/monitoring/runbooks/nuxt_request_count.md +++ b/documentation/meta/monitoring/runbooks/nuxt_request_count_anomaly.md @@ -1,4 +1,4 @@ -# Run Book: Nuxt request count above threshold +# Run Book: Nuxt Request Count anomalously high ```{admonition} Metadata Status: **Unstable** @@ -6,7 +6,7 @@ Status: **Unstable** Maintainer: @dhruvkb Alarm link: -- [production-nuxt](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+request+count+above+threshold) +- [production-nuxt](https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#alarmsV2:alarm/Nuxt+Production+Request+Count+anomalously+high) ``` ## Severity guide