You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alertmanager has a single configuration file which contains all receivers and the routing tree. There is currently no safe way to have multiple clients reading, modifying and writing the configuration file. This poses a problem when:
Disparate teams want to manage a subset of the configuration file.
User interfaces (such as Grafana) that want to edit a single receiver or route.
It makes sense that we would want to support optimistic concurrency on Alertmanager configurations out of the box (i.e. without requiring an intermediary synchronizing configurations).
Proposal
One option is to support the standard HTTP Etag/If-Accept mechanism:
GET /alertmanager/api/v1/alerts: Will return an ETag with each response.
POST /alertmanager/api/v1/alerts: Will optionally accept an If-Match header.
The client would:
GET the configuration
Modify the configuration
POST the configuration, with an If-Match header
If Alertmanger returns 412, GET again and retry the update
Arguably, the Alertmanager configuration write API could have been a PUT, but I don’t see any need to go in depth into that discussion now, that is orthogonal.
Implementation
This is trivial to implement for GCS and Azure Storage, because they both support If-Match for PUT requests. (It would also be straightforward to find a solution for Filesystem backend). However, S3 does not support If-Match for writes, and so we’ll have to check it ourselves.
When writing configurations, and an If-Match is provided, we will need to read the current configuration, check it has the expected content, and write the new content. There is a race condition here so the checking and uploading have to be done under a lock. To do this without introducing external dependencies, we can use If-Not-Matches: * to implement a rudimentary lock using object storage, which is now supported by S3 in addition to the other providers.
Upload a lock object using If-Not-Matches: *. If the upload fails:
Check the lock object timestamp, it’s over some age threshold, delete it *
Retry with some back-off
Read the current configuration
If it does not match the hash passed to If-Match, return 412
Upload the new configuration
Delete the lock object
* This mechanism is needed to detect stale locks, if an Alertmanager crashes between uploading and deleting the lock object.
This implementation can be achieved with minimal changes to the object storage code, we only need a mechanism to signal “do not overwrite” when calling Upload on the bucket client. The performance and other overheads of this solution are not a concern; configurations are uploaded infrequently (worst case every might be few minutes if being actively iterated on; then a configuration might be unchanged for days, weeks or longer).
Iterative Improvements
Use ETag values from object storage instead of computing our own hash (saves downloading the existing configuration in full)
Use If-Match on object storage providers if available (hopefully S3 supports it soon).
The text was updated successfully, but these errors were encountered:
Problem
Alertmanager has a single configuration file which contains all receivers and the routing tree. There is currently no safe way to have multiple clients reading, modifying and writing the configuration file. This poses a problem when:
It makes sense that we would want to support optimistic concurrency on Alertmanager configurations out of the box (i.e. without requiring an intermediary synchronizing configurations).
Proposal
One option is to support the standard HTTP
Etag
/If-Accept
mechanism:GET /alertmanager/api/v1/alerts
: Will return anETag
with each response.POST /alertmanager/api/v1/alerts
: Will optionally accept anIf-Match
header.The client would:
If-Match
headerArguably, the Alertmanager configuration write API could have been a PUT, but I don’t see any need to go in depth into that discussion now, that is orthogonal.
Implementation
This is trivial to implement for GCS and Azure Storage, because they both support
If-Match
for PUT requests. (It would also be straightforward to find a solution for Filesystem backend). However, S3 does not supportIf-Match
for writes, and so we’ll have to check it ourselves.When writing configurations, and an
If-Match
is provided, we will need to read the current configuration, check it has the expected content, and write the new content. There is a race condition here so the checking and uploading have to be done under a lock. To do this without introducing external dependencies, we can useIf-Not-Matches: *
to implement a rudimentary lock using object storage, which is now supported by S3 in addition to the other providers.If-Not-Matches: *
. If the upload fails:If-Match
, return 412* This mechanism is needed to detect stale locks, if an Alertmanager crashes between uploading and deleting the lock object.
This implementation can be achieved with minimal changes to the object storage code, we only need a mechanism to signal “do not overwrite” when calling Upload on the bucket client. The performance and other overheads of this solution are not a concern; configurations are uploaded infrequently (worst case every might be few minutes if being actively iterated on; then a configuration might be unchanged for days, weeks or longer).
Iterative Improvements
If-Match
on object storage providers if available (hopefully S3 supports it soon).The text was updated successfully, but these errors were encountered: