[opensearch] Disk usage alarm can trigger for managed OpenSearch domain due to missing data points #602

condrayr · 2024-11-19T17:11:38Z

Version

8.3.3

Steps and/or minimal code example to reproduce

Have a managed OpenSearch domain that sporadically doesn't report the FreeStorageSpace. According to AWS Support, this can occur because of an update to the Internal agent responsible for publishing the CloudWatch metrics for the domain
Notice the alarm triggers due to this missing data point, even though the disk usage threshold of the domain is below the threshold. This is because of the metric math used for this alarm, which is 100 * (used/(used+free)), becomes 100 * (used/(used+0)) = 100 * (used/used) = 100

Expected behavior

The alarm should not trigger, as the domain is not actually exceeding the disk usage threshold.

Actual behavior

The alarm triggers.

Other details

According to AWS Support, the FreeStorageSpace metric can be sporadically missing data points because of an update to the Internal agent responsible for publishing the CloudWatch metrics for the domain.

I examined the domain and could observe that the missing data point was because of an update to the Internal agent responsible for publishing the CloudWatch metrics for the domain and because of which the FreeStorageMetrics was not reported for 18:51, 18:52 UTC. Further, nothing can be done from your end to avoid this missing data point, however in case in future if you observe any scenario where the metric miss the data point for a extended duration, please do let us know and we can look into this further.

As a workaround, the number of data points to alarm on can be increased, as for the two examples we've seen, there were only one or two missing data points at a time.

It may also be worth considering adding alarms for continued missing metrics.

The text was updated successfully, but these errors were encountered:

condrayr added the bug Something isn't working label Nov 19, 2024

echeung-amzn changed the title ~~Disk usage alarm can trigger for managed OpenSearch domain due to missing data points~~ [opensearch] Disk usage alarm can trigger for managed OpenSearch domain due to missing data points Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[opensearch] Disk usage alarm can trigger for managed OpenSearch domain due to missing data points #602

[opensearch] Disk usage alarm can trigger for managed OpenSearch domain due to missing data points #602

condrayr commented Nov 19, 2024

[opensearch] Disk usage alarm can trigger for managed OpenSearch domain due to missing data points #602

[opensearch] Disk usage alarm can trigger for managed OpenSearch domain due to missing data points #602

Comments

condrayr commented Nov 19, 2024

Version

Steps and/or minimal code example to reproduce

Expected behavior

Actual behavior

Other details