Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 [Feature] Add recording rules to Prometheus configuration to store hourly/daily metrics #447

Open
huard opened this issue Apr 10, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@huard
Copy link
Collaborator

huard commented Apr 10, 2024

... trying to split the metrics collection issue into small digestible bits.

Description

  1. Select among all existing metrics recorded by Prometheus those of interest to external stakeholders:
  • Number of active users
  • CPU usage
  • Bandwidth usage
  • ...
  1. Create recording rules in our current Prometheus server to store hourly/daily resolution metrics:
  • Total number of active users per day
  • Mean hourly CPU usage
  • Mean hourly bandwidth usage
  • ...

Make sure the frequency is clearly indicated in the metrics names.

  1. Display hourly/daily data in Graphana dashboard

This doesn't solve the data retention issue, but it's a necessary step. A second Prometheus instance can federate the first instance and scrape the metrics that match a hourly/daily regexp pattern.

References

#277

Concerned Organizations

@mishaschwartz
Copy link
Collaborator

@huard you mentioned that there is an existing metric that records total number of active users per day. Do you know which metric that is (its name, how it is collected)?

@huard
Copy link
Collaborator Author

huard commented Jun 13, 2024

The only thing I know is that it's part of the vanilla config that came with the docker image. @tlvu would know more about this.

@tlvu
Copy link
Collaborator

tlvu commented Jun 13, 2024

The only thing I know is that it's part of the vanilla config that came with the docker image. @tlvu would know more about this.

No we do not have that metric (total number of active users per day).

All the metrics that comes with the "vanilla" config are listed here https://github.com/bird-house/birdhouse-deploy/blob/9d9f46c497e2b00a6ad5be9f1e3ec322f85868a3/birdhouse/components/README.rst#grafana-dashboard

mishaschwartz added a commit that referenced this issue Feb 24, 2025
#461)

## Overview

The `prometheus-longterm-metrics` component collects longterm monitoring
metrics from the original prometheus instance (the one created by the
``components/monitoring`` component).

Longterm metrics are any prometheus rule that have the label ``group:
longterm-metrics`` or in other words are selectable using prometheus's
``'{group="longterm-metrics"}'`` query filter. To see which longterm
metric rules are added by default see the
``optional-components/prometheus-longterm-metrics/config/monitoring/prometheus.rules.template``
file.

To configure this component:

* update the ``PROMETHEUS_LONGTERM_RETENTION_TIME`` variable to set how
long the data will be kept by prometheus
* update the ``PROMETHEUS_LONGTERM_STORE_INTERVAL`` variable to set how
often the longterm metrics rules will be calculated. For example,
setting it to ``10h`` will calculate these metrics every 10 hours.

Enabling the `prometheus-longterm-metrics` component creates the
additional endpoint ``/prometheus-longterm-metrics``.

The `thanos` component enables better storage of longterm metrics
collected by the ``optional-components/prometheus-longterm-metrics``
component. Data will be collected from the
``prometheus-longterm-metrics`` and stored in an S3 object store
indefinitely.

When enabling this component, please change the default values for the
``MINIO_ROOT_USER`` and ``MINIO_ROOT_PASSWORD`` by updating the
``env.local`` file. These set the login credentials for the root user
that runs the [minio](https://min.io/) object store.

Enabling the `thanos` component creates the additional endpoints:

* ``/thanos-query``: a prometheus-like query interface to inspect the
data stored by thanos
* ``/thanos-minio``: a minio web console to inspect the data stored by
minio.

This also includes an update to the prometheus version from `v2.19.0` to
the current latest `v2.52.0`. This is to
required to support the interaction between prometheus and thanos.

## Changes

**Non-breaking changes**
- New component version: prometheus:v2.52.0

## Related Issue / Discussion

- Resolves #277
- Add some initial metrics as described in #447 but we should really add
more (either to this PR or in a future PR) by adding more rules to the
`birdhouse/optional-components/prometheus-longterm-metrics/config/monitoring/prometheus.rules.template`
file.

## Additional Information

- I tested upgrading the prometheus version and there were no issues (no
loss of data, no changed APIs etc.)
- Note that the thanos set up is pretty minimal but probably good enough
for our purposes. We can always add more of the thanos
features/components in the future if needed.

## CI Operations

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci`` set to ``true`` in the PR description.
Note that using ``[skip ci]``, ``[ci skip]`` or ``[no ci]`` in the
commit message will override ``birdhouse_skip_ci`` from the PR
description.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants