Metrics is a library to collect and expose Tarantool-based applications metrics.
Library includes:
- four base metric collectors: Counter, Gauge, Histogram, Summary
- ready to use Tarantool stats collectors built on top of base collectors
- exporters to expose collected metrics in Prometheus, Graphite and generic JSON format
- module to integrate into Tarantool Cartridge based applications
cd ${PROJECT_ROOT}
tt rocks install metrics
In order to easily export metrics to any TSDB, you can use one of the supported export plugins:
or you can write your custom plugin and use it. Hopefully, plugins for other TSDBs will be supported soon.
There are four basic metric collectors available: Counter, Gauge, Summary and Histogram. The exact semantics of each metric follows the prometheus metric types.
Counter is a cummulative metric which value can only be incremented or reset to zero on restart. Counters are useful for accumulating number of events, e.g. requests processed, orders in e-shop. Counter is exposed as a single numerical value.
local metrics = require('metrics')
-- create a counter
local http_requests_total_counter = metrics.counter('http_requests_total')
-- somewhere in HTTP requests middleware:
http_requests_total_counter:inc(1, {method = 'GET'})
Gauge is a metric that represents a single numerical value that can be changed arbitrarily. Gauges are useful for capturing a snapshot of the current state, e.g. CPU utilization, number of open connections. Gauge is exposed as a single numerical value.
local metrics = require('metrics')
-- create a gauge
local cpu_usage_gauge = metrics.gauge('cpu_usage', 'CPU usage')
-- register a lazy gauge value update
-- this will be called whenever the export is invoked in any plugins
metrics.register_callback(function()
local current_cpu_usage = math.random()
cpu_usage_gauge:set(current_cpu_usage, {app = 'tarantool'})
end)
Histogram counts observed values into configurable buckets. Histograms are useful for tracking request latencies, processing time. Histogram is exposed as multiple numerical values:
- the total count of observed events
- the total sum of observed values
- counters of observed events per bucket
local metrics = require('metrics')
-- create a histogram
local http_requests_latency_hist = metrics.histogram(
'http_requests_latency', 'HTTP requests total', {2, 4, 6})
-- somewhere in the HTTP requests middleware:
local latency = math.random(1, 10)
http_requests_latency_hist:observe(latency)
Summary aggregates observed values into configurable quantiles. Summaries are useful as a service level indicator (e.g. SLAs, SLOs). Summary is exposed as multiple numerical values:
- the total count of observed events
- the total sum of observed values
- number of observed events per quantile
local metrics = require('metrics')
-- create a summary with a sliding window of 5 age buckets and 60s bucket lifetime
local http_requests_latency = metrics.summary(
'http_requests_latency', 'HTTP requests total',
{[0.5]=0.01, [0.9]=0.01, [0.99]=0.01},
{max_age_time = 60, age_buckets_count = 5}
)
-- somewhere in the HTTP requests middleware:
local latency = math.random(1, 10)
http_requests_latency:observe(latency)
In production environments Tarantool Cluster usually has a large number of so called "routers", Tarantool instances that handle input load and it is required to evenly distribute the load. Various load-balancers are used for this, but any load-balancer have to know which "routers" are ready to accept the load at that very moment. Metrics library has a special plugin that creates an http handler that can be used by the load-balancer to check the current state of any Tarantool instance. If the instance is ready to accept the load, it will return a response with a 200 status code, if not, with a 500 status code.
See:
- A more detailed getting started guide
- Metrics API reference
- Detailed information on plugins
Feel free to send Pull Requests. To increase the chance of having your pull request accepted, make sure it follows these guidelines:
- Title and description matches the implementation.
- Code follows styleguide.
- The pull request closes one or more of related issues. If not, please add an issue first.
- The pull request contains necessary tests that verify the intended behavior.
- The pull request contains a CHANGELOG note and documentation update if needed.
Your pull request will be reviewed in 3-5 days.
If you have questions, please ask it on StackOverflow or contact us in Telegram:
We would like to thank Prometheus for a great API that we brusquely borrowed.