-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruler: add support for caching rule group contents #9386
Labels
Comments
56quarters
added a commit
that referenced
this issue
Sep 23, 2024
Invalidate content, existence, and attributes cached for a particular object when the object is modified or deleted. Part of #9386
4 tasks
56quarters
added a commit
that referenced
this issue
Sep 24, 2024
Invalidate content, existence, and attributes cached for a particular object when the object is modified or deleted. Part of #9386
56quarters
added a commit
that referenced
this issue
Sep 26, 2024
Instead of using two different `RuleStore` implementations within the Ruler, use a single caching implementation and selectively disable caching when required. This change removes the "direct" `RuleStore` implementation from the Ruler's gRPC and HTTP API layers. Instead, the caching implementation is used for all calls. In cases where caching returning stale results would not be acceptable, the caching is disabled _just_ for that call. This allows rule group contents to be safety cached with the understanding that it is safe to cache them because they will correctly invalidated when deleted or modified. Part of #9386
4 tasks
56quarters
added a commit
that referenced
this issue
Sep 26, 2024
Instead of using two different `RuleStore` implementations within the Ruler, use a single caching implementation and selectively disable caching when required. This change removes the "direct" `RuleStore` implementation from the Ruler's gRPC and HTTP API layers. Instead, the caching implementation is used for all calls. In cases where caching returning stale results would not be acceptable, the caching is disabled _just_ for that call. This allows rule group contents to be safety cached with the understanding that it is safe to cache them because they will correctly invalidated when deleted or modified. Part of #9386
56quarters
added a commit
that referenced
this issue
Sep 27, 2024
Instead of using two different `RuleStore` implementations within the Ruler, use a single caching implementation and selectively disable caching when required. This change removes the "direct" `RuleStore` implementation from the Ruler's gRPC and HTTP API layers. Instead, the caching implementation is used for all calls. In cases where caching returning stale results would not be acceptable, the caching is disabled _just_ for that call. This allows rule group contents to be safety cached with the understanding that it is safe to cache them because they will correctly invalidated when deleted or modified. Part of #9386
56quarters
added a commit
that referenced
this issue
Sep 30, 2024
Instead of using two different `RuleStore` implementations within the Ruler, use a single caching implementation and selectively disable caching when required. This change removes the "direct" `RuleStore` implementation from the Ruler's gRPC and HTTP API layers. Instead, the caching implementation is used for all calls. In cases where caching returning stale results would not be acceptable, the caching is disabled _just_ for that call. This allows rule group contents to be safety cached with the understanding that it is safe to cache them because they will correctly invalidated when deleted or modified. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Sep 30, 2024
Instead of using two different `RuleStore` implementations within the Ruler, use a single caching implementation and selectively disable caching when required. This change removes the "direct" `RuleStore` implementation from the Ruler's gRPC and HTTP API layers. Instead, the caching implementation is used for all calls. In cases where caching returning stale results would not be acceptable, the caching is disabled _just_ for that call. This allows rule group contents to be safety cached with the understanding that it is safe to cache them because they will correctly invalidated when deleted or modified. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
to grafana/dskit
that referenced
this issue
Oct 3, 2024
This change adds a synchronous version of `.Set` to Memcached and Redis clients as well as the various `Cache` wrapper implementations. This allows callers to set a key and be sure it exists in the cache. This change also adds an `.Add` method which conditionally adds an item to the cache only if it does not already exist. This change is a prerequisite for grafana/mimir#9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
2 tasks
56quarters
added a commit
to grafana/dskit
that referenced
this issue
Oct 3, 2024
This change adds a synchronous version of `.Set` to Memcached and Redis clients as well as the various `Cache` wrapper implementations. This allows callers to set a key and be sure it exists in the cache. This change also adds an `.Add` method which conditionally adds an item to the cache only if it does not already exist. This change is a prerequisite for grafana/mimir#9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
to grafana/dskit
that referenced
this issue
Oct 3, 2024
This change adds a synchronous version of `.Set` to Memcached and Redis clients as well as the various `Cache` wrapper implementations. This allows callers to set a key and be sure it exists in the cache. This change also adds an `.Add` method which conditionally adds an item to the cache only if it does not already exist. This change is a prerequisite for grafana/mimir#9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 3, 2024
Instead of using two different `RuleStore` implementations within the Ruler, use a single caching implementation and selectively disable caching when required. This change removes the "direct" `RuleStore` implementation from the Ruler's gRPC and HTTP API layers. Instead, the caching implementation is used for all calls. In cases where caching returning stale results would not be acceptable, the caching is disabled _just_ for that call. This allows rule group contents to be safety cached with the understanding that it is safe to cache them because they will correctly invalidated when deleted or modified. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 3, 2024
Instead of using two different `RuleStore` implementations within the Ruler, use a single caching implementation and selectively disable caching when required. This change removes the "direct" `RuleStore` implementation from the Ruler's gRPC and HTTP API layers. Instead, the caching implementation is used for all calls. In cases where caching returning stale results would not be acceptable, the caching is disabled _just_ for that call. This allows rule group contents to be safety cached with the understanding that it is safe to cache them because they will correctly invalidated when deleted or modified. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 4, 2024
Instead of using two different `RuleStore` implementations within the Ruler, use a single caching implementation and selectively disable caching when required. This change removes the "direct" `RuleStore` implementation from the Ruler's gRPC and HTTP API layers. Instead, the caching implementation is used for all calls. In cases where caching returning stale results would not be acceptable, the caching is disabled _just_ for that call. This allows rule group contents to be safety cached with the understanding that it is safe to cache them because they will correctly invalidated when deleted or modified. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
to grafana/dskit
that referenced
this issue
Oct 7, 2024
* cache: Add `.Set` and `.Add` methods to cache clients This change adds a synchronous version of `.Set` to Memcached and Redis clients as well as the various `Cache` wrapper implementations. This allows callers to set a key and be sure it exists in the cache. This change also adds an `.Add` method which conditionally adds an item to the cache only if it does not already exist. This change is a prerequisite for grafana/mimir#9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Changelog Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review fixes Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Add test for .Add() method semantics for LRU cache Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Assert on cache contents for LRU `.Add()` test Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> --------- Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
to grafana/dskit
that referenced
this issue
Oct 8, 2024
Add an `.Advance()` method to MockCache and InstrumentedMockCache to allow the time considered "now" to be moved without needing to actually sleep. This is useful for testing when items are set with a TTL and you would like for them to actually expire as they would in a real cache. Part of grafana/mimir#9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
2 tasks
56quarters
added a commit
to grafana/dskit
that referenced
this issue
Oct 8, 2024
Add an `.Advance()` method to MockCache and InstrumentedMockCache to allow the time considered "now" to be moved without needing to actually sleep. This is useful for testing when items are set with a TTL and you would like for them to actually expire as they would in a real cache. Part of grafana/mimir#9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
to grafana/dskit
that referenced
this issue
Oct 9, 2024
Add an `.Advance()` method to MockCache and InstrumentedMockCache to allow the time considered "now" to be moved without needing to actually sleep. This is useful for testing when items are set with a TTL and you would like for them to actually expire as they would in a real cache. Part of grafana/mimir#9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 9, 2024
This change makes use of `add` operations for special "lock" cache entries to ensure that when items in object storage are mutated, stale results are not immediately stored to cache again. It does this by `set`ing "lock" cache entries with a short TTL when an item in object storage is about to be mutated. This prevents reads of the item from caching the results afterwards. After the item is mutated in object storage, its cache entries (excluding the lock entries) are deleted. After the lock entries expire, reads of the item are allowed to store results in the cache again. Part of #9386
4 tasks
56quarters
added a commit
that referenced
this issue
Oct 9, 2024
This change makes use of `add` operations for special "lock" cache entries to ensure that when items in object storage are mutated, stale results are not immediately stored to cache again. It does this by `set`ing "lock" cache entries with a short TTL when an item in object storage is about to be mutated. This prevents reads of the item from caching the results afterwards. After the item is mutated in object storage, its cache entries (excluding the lock entries) are deleted. After the lock entries expire, reads of the item are allowed to store results in the cache again. Part of #9386
56quarters
added a commit
that referenced
this issue
Oct 9, 2024
This change makes use of `add` operations for special "lock" cache entries to ensure that when items in object storage are mutated, stale results are not immediately stored to cache again. It does this by `set`ing "lock" cache entries with a short TTL when an item in object storage is about to be mutated. This prevents reads of the item from caching the results afterwards. After the item is mutated in object storage, its cache entries (excluding the lock entries) are deleted. After the lock entries expire, reads of the item are allowed to store results in the cache again. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 11, 2024
* Add more robust cache invalidation to CachingBucket This change makes use of `add` operations for special "lock" cache entries to ensure that when items in object storage are mutated, stale results are not immediately stored to cache again. It does this by `set`ing "lock" cache entries with a short TTL when an item in object storage is about to be mutated. This prevents reads of the item from caching the results afterwards. After the item is mutated in object storage, its cache entries (excluding the lock entries) are deleted. After the lock entries expire, reads of the item are allowed to store results in the cache again. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review feedback Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> --------- Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 11, 2024
Add a new experimental flag to enable caching of rule group contents using the rule store cache. Rule groups are cached using the same TTL as rule group listings: one evaluation interval. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
4 tasks
56quarters
added a commit
that referenced
this issue
Oct 15, 2024
Add a new experimental flag to enable caching of rule group contents using the rule store cache. Rule groups are cached using the same TTL as rule group listings. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 16, 2024
Add a new experimental flag to enable caching of rule group contents using the rule store cache. Rule groups are cached using the same TTL as rule group listings. Part of #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 17, 2024
Exclude alerts from firing about cache "add" operations failing since this is expected during normal operation. Related #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
4 tasks
56quarters
added a commit
that referenced
this issue
Oct 17, 2024
Exclude alerts from firing about cache "add" operations failing since this is expected during normal operation. Related #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Oct 17, 2024
* mixin: Exclude cache "add" operations from alerting Exclude alerts from firing about cache "add" operations failing since this is expected during normal operation. Related #9386 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Build helm tests Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> --------- Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com>
56quarters
added a commit
that referenced
this issue
Nov 20, 2024
When caching rule groups, the base64 encoded name of the rule group is used as part of the key for caching its contents. For long rule group names, this can exceed the max key length of Memcached. To solve this we use the sha256 of the object name in the cache key instead of the object name itself. Related #9386
4 tasks
56quarters
added a commit
that referenced
this issue
Nov 20, 2024
When caching rule groups, the base64 encoded name of the rule group is used as part of the key for caching its contents. For long rule group names, this can exceed the max key length of Memcached. To solve this we use the sha256 of the object name in the cache key instead of the object name itself. Related #9386
56quarters
added a commit
that referenced
this issue
Nov 21, 2024
When caching rule groups, the base64 encoded name of the rule group is used as part of the key for caching its contents. For long rule group names, this can exceed the max key length of Memcached. To solve this we use the sha256 of the object name in the cache key instead of the object name itself. Related #9386
56quarters
added a commit
that referenced
this issue
Nov 22, 2024
When caching rule groups, the base64 encoded name of the rule group is used as part of the key for caching its contents. For long rule group names, this can exceed the max key length of Memcached. To solve this we use the sha256 of the object name in the cache key instead of the object name itself. Related #9386
56quarters
added a commit
that referenced
this issue
Nov 25, 2024
When caching rule groups, the base64 encoded name of the rule group is used as part of the key for caching its contents. For long rule group names, this can exceed the max key length of Memcached. To solve this we use a cryptographic hash of the object name when the hashed version of the name is shorter than the full name. This is the same approach taken for postings in the store-gateway `indexcache`. This has the added benefit of not invalidating most existing cache entries when rolling out this change. With the hash function picked, key generation is between 5 and 10 times slower than _not_ hashing the key but still dramatically faster than a network operation. Related #9386
4 tasks
56quarters
added a commit
that referenced
this issue
Nov 26, 2024
When caching rule groups, the base64 encoded name of the rule group is used as part of the key for caching its contents. For long rule group names, this can exceed the max key length of Memcached. To solve this we use a cryptographic hash of the object name when the hashed version of the name is shorter than the full name. This is the same approach taken for postings in the store-gateway `indexcache`. This has the added benefit of not invalidating most existing cache entries when rolling out this change. With the hash function picked, key generation is between 5 and 10 times slower than _not_ hashing the key but still dramatically faster than a network operation. Related #9386
4 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
Currently, the contents of rule groups for tenants are not cached. This means an API call to return all configuration for rule groups must perform a list call for object storage (this is sometimes cached) and then N additional object storage requests. For tenants with a large number of rule groups, this can result in a huge number of calls.
Most rule group configurations don't change very often. This makes them a good candidate for caching. However when they are changed, that needs to be reflected immediately in API results. So if we want to add caching for rule groups we also need a way to invalidate their cached contents.
Describe the solution you'd like
We should:
CachingBucket
to invalidate cached object storage contents on mutations. (PR)Get
operations (PR)Get
operations to something longer than a single evaluation intervalDescribe alternatives you've considered
Doing nothing is simpler and doesn't require opening up the can of worms that is cache invalidation.
The text was updated successfully, but these errors were encountered: