Downsampling of metric data after a certain period #1834
-
Hello everyone :) I'm using Grafana Mimir as Prometheus data backend for a few days now and I'm very pleased with its performance. Now I have a question about data downsampling, because I didn't find anything about it in the documentation. As far as I understand Mimir will store all data for an unlimited time as long as I don't configure a retention time. Is it possible to configure it, that after some time (for example a week) the raw data will be aggregated as (for example) one hour data points which will be represented by a min, max and average value and the corresponding raw data will be deleted? Would be interesting if something like this is possible, because I don't need the raw data of every metric for all time, but some trend metrics would be nice. Because for reports like "how many http requests hit the loadbalancer two years ago" I don't need every 10s data point 😄. Thanks in advance for your help :) Kind regards, |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 22 replies
-
Hello! No, down sampling is not currently supported. I'm not sure if there are plans to work on it in the future. Are you interested in this because long range queries are taking a long time? Or to save on storage costs? Or some other reason? |
Beta Was this translation helpful? Give feedback.
-
Thank you for your question. There are several problems with downsampling that need to be considered:
These are just random thoughts that we would need to take into consideration when designing downsampling feature. |
Beta Was this translation helpful? Give feedback.
-
I don't care the storage cost but care user experience for long time range query. The main goal of downsampling is providing an opportunity to get fast results for long range queries like months or years. |
Beta Was this translation helpful? Give feedback.
-
As discussed on slack, I would like to share my usecases As users, we would like to query a wide range of serie(s). The full resolution is not mandatory. However, when runnning a wide range query the response time impact the user experience. # cluster level configuration
compactor:
downsampling:
- 1d:1m # After 1d apply downsampling add keep 1 sample per minute
- 2d:5m
- 2w:1h
# runtime configuration
overrides:
tenant1:
downsampling:
- 1d:1m # After 1d apply downsampling add keep 1 sample per minute
- 5d:5m
- 4w:1h We can even think of downsampling differently a subset of the series associated with a tenant based on a regex. overrides:
tenant1:
downsampling:
".*":
- 1d:1m # After 1d apply downsampling add keep 1 sample per minute
- 5d:5m
- 4w:1h
"cpu.*":
- 1d:1m # After 1d apply downsampling add keep 1 sample per minute
- 4w:1h In some case, once downsampled, the full resolution might not been needed. ExamplesHigh frequency samplingEnd users want to sample at 1Hz but the full resolution is needed only for a short period of time. Capacity planningKeep low resolution data for capacity planning. Indeed, the more data we have in the past the more accurate the forecasting is. The use case could be several years (2 to 5 years). Prune full resolutionOnly keep the downsampled data after a pre-defined period of time Additional contextside noteThis would help users migrate from other backend without feature loss:
|
Beta Was this translation helpful? Give feedback.
-
Hello, I must choose a new metric storage and I would like to use downsampling. Is this planned to add this to Mimir or I must go with Thanos ? Tky. |
Beta Was this translation helpful? Give feedback.
-
I reckon performance on multi-month/multi-year queries is more important than the storage cost. Object storage pricing is generally low compared to the compute pricing. On alternative to both tackle performance and cost is to have downsampling and different retention per series. This way you could keep the downsampled metrics longer than the raw metrics. This is not possible at the moment as all metrics of a tenant are kept for the same amount of time. |
Beta Was this translation helpful? Give feedback.
-
That's exactly what we need. In our case, we don't care about precises metrics over time, we need to see the trends. We would like to have high resolutions metrics for a short range like 1 month, medium resolution metrics after that for like 6 months, and low resolution metrics for several years. |
Beta Was this translation helpful? Give feedback.
-
Is it maybe worth opening an issue for this since there is appetite from the community? i too have this requirement and for the same reasons as those above ^^ |
Beta Was this translation helpful? Give feedback.
-
I've been thinking about a proposal for a long time. Here is what I've come up. Be advised it is not as easy as it looks. |
Beta Was this translation helpful? Give feedback.
Thank you for your question. There are several problems with downsampling that need to be considered:
when using object store like GCP or S3, storage costs are typically only small fraction of cost of running Mimir, so saving here may not be high.
downsampling is very IO intensive. Because it's not possible to modify TSDB blocks in place, downsampling requires that blocks are downloaded, rebuilt from scratch with downsampled series (possibly only some of them, based on configuration) and then uploaded back. Old blocks must be deleted. All this processing adds to the cost.
downsampling complicates querying. PromQL query engine uses single look-back period when looking for samples. Ty…