Monitoring cache performance using Sentry #68265
Replies: 2 comments 5 replies
-
Oh this is really good @bcoe. Congratulations on joining Sentry by the way. I recently started adopting caching intentionally in my apps and I really have been thinking "Ain't this a black box in terms of monitoring?". Like how do I know if there was a cache hit or cache hit or if something is awry with my entire cache setup with Sails Stash? So definitely cache monitoring from Sentry will be amazing for me to look into "What's up with my cache". My cache uses Redis as the cache backend in the Sails framework. Currently in Hagfish the only cache I have in terms of application-level cache is on the count of sent invoices which I cache for a week. I also plan on using Sentry for Sailscasts and it caches the courses on the courses page for a month. I like the cache monitoring mockups and having the granularity of insights into cache issues be on transactions is fine as well. Again I currently use Sails Stash which provides a cache abstraction that currently supports Redis for caching by using the sails-redis Waterline adapter and I will be keen to see how this will work. Great job with this one @bcoe |
Beta Was this translation helpful? Give feedback.
-
Hey Ben! Super excited to see that y'all are thinking about this. I spend as much time as I can working with Sentry to improve our monitoring - mostly with an eye toward performance. Below is what I often think about re: caching that I think Sentry might be able to help with. Current stateWe already have Sentry deployed both in our Vue.js FE and Node.js BE. We use the vue router + express integrations to track page loads and API calls as transactions. In many of our APIs, we use a Redis cache both to reduce redundant compute and speed up certain DB queries that might otherwise be heavier than desired. Our Redis layer is custom - we can wrap a given async function with a caching utility that works essentially like lodash memoize - every call to the function first attempts to read a value from cache, otherwise we "read through" by calling the underlying function (which likely makes a DB call and/or does some heavy compute on some data) and write the value to cache for next time. I've written a sentry wrapper around our underlying redis utils that tracks calls to redis as spans within the API call transactions. This is helpful for debugging of individual transactions, but is borderline useless for trying to understand the impact of caching across large numbers of transactions. Any given API call likely makes an average of 1.5-2 cache reads for us. For example, we might have endpoint Questions I'm interested inFor the most part, I think there are two ways to think about cache health + performance: either from the perspective of a logical cache (e.g. accesses to + health of the auth data cache across all APIs we have) or from the perspective of a given transaction (e.g. the Here are the types of things I'm interested in, in roughly order of decreasing importance: [1] [Transaction Perspective] What is the difference in total endpoint performance of
[2] [Transaction Perspective] I'm also very interested in the impact of caches on the perf of higher-order transactions. e.g. Page load X makes calls to a dozen endpoints, many of which may rely on cache reads. How often does Page X actually have a happy-path load where most/all of the APIs it calls hit warm caches? Ideally, aggregate cache hit/miss information would be visible up the full chain of transactions, not just on the immediate parent of the cache read. [3] [Cache Perspective] I am also interested in the health of a cache from its own perspective: Did our auth data cache get slower across the board recently? Did the cost of a
[4] [Cache Perspective] What's the average value size (in bytes) of values read from/written to X cache? How does the performance of reads + writes scale with different value sizes? Other thoughts
I think the mocks you've posted so far generally are thinking about the right kinds of metrics re-caches but they appear to be very heavily focused on the cache perspective rather than the "usage of cache X within transaction Y" perspective. While that is itself quite useful, I'd say 90-95% of the time I'm thinking about caches from the perspective of their impact on a given transaction (or chain of transactions - like a full page load), not in isolation. In general, this is a very greenfield wishlist post - I'm hoping to cover the shape of my overall cache thinking so you can pick and choose what you think is most important or where Sentry is best positioned to help. I'm more than happy to have any follow-up chats that you might want here - like I said, I'm very interested in something like this having first-class support in Sentry. |
Beta Was this translation helpful? Give feedback.
-
Hi! I’m Ben, I recently joined Sentry as a Product Manager working on Performance Monitoring. 👋
I’d like to get in the habit of sharing features we’re exploring early, so that our most engaged users (✨ You! ✨) can help shape the design.
With this goal in mind, I’m seeking feedback on Cache performance monitoring…
Cache performance monitoring
Cache monitoring will be similar to query monitoring except, instead of queries, it provides insights into your application's cache behaviour.
Here are some cache performance questions we hope to help developers answer:
We landed on these use cases initially, because they came to mind as real-world application performance regressions Sentry can help identify and fix.
Request for feedback
Some questions to help kick off this conversation:
Mockups
Cache overview page
This page serves as a starting point for digging into specific cache performance issues.
Perhaps you’ve noticed requests are occasionally slow hitting an endpoint configured to use Django’s cache framework. Starting on the Cache Overview page, you can identify whether the endpoint in question has a higher than expected Miss % (across all cache reads within the transaction). At this point, you can click the transaction itself for details about its corresponding spans.
Transaction overlay
The Transaction overlay allows you to dig into cache performance issues tied to a specific transaction:
Looking forward to people’s feedback in this discussion.
Alternatively, if you’d rather reach out by email, you can find it here
— @bcoe
Beta Was this translation helpful? Give feedback.
All reactions