Metrics

Metrics is a library to collect and expose Tarantool-based applications metrics.

Library includes:

four base metric collectors: Counter, Gauge, Histogram, Summary
ready to use Tarantool stats collectors built on top of base collectors
exporters to expose collected metrics in Prometheus, Graphite and generic JSON format
module to integrate into Tarantool Cartridge based applications

Installation

cd ${PROJECT_ROOT}
tt rocks install metrics

Plugins export

In order to easily export metrics to any TSDB, you can use one of the supported export plugins:

Graphite
Prometheus
Json

or you can write your custom plugin and use it. Hopefully, plugins for other TSDBs will be supported soon.

Metric types

There are four basic metric collectors available: Counter, Gauge, Summary and Histogram. The exact semantics of each metric follows the prometheus metric types.

Counter

Counter is a cummulative metric which value can only be incremented or reset to zero on restart. Counters are useful for accumulating number of events, e.g. requests processed, orders in e-shop. Counter is exposed as a single numerical value.

local metrics = require('metrics')

-- create a counter
local http_requests_total_counter = metrics.counter('http_requests_total')

-- somewhere in HTTP requests middleware:
http_requests_total_counter:inc(1, {method = 'GET'})

Gauge

Gauge is a metric that represents a single numerical value that can be changed arbitrarily. Gauges are useful for capturing a snapshot of the current state, e.g. CPU utilization, number of open connections. Gauge is exposed as a single numerical value.

local metrics = require('metrics')

-- create a gauge
local cpu_usage_gauge = metrics.gauge('cpu_usage', 'CPU usage')

-- register a lazy gauge value update
-- this will be called whenever the export is invoked in any plugins
metrics.register_callback(function()
    local current_cpu_usage = math.random()
    cpu_usage_gauge:set(current_cpu_usage, {app = 'tarantool'})
end)

Histogram

Histogram counts observed values into configurable buckets. Histograms are useful for tracking request latencies, processing time. Histogram is exposed as multiple numerical values:

the total count of observed events
the total sum of observed values
counters of observed events per bucket

local metrics = require('metrics')

-- create a histogram
local http_requests_latency_hist = metrics.histogram(
    'http_requests_latency', 'HTTP requests total', {2, 4, 6})

-- somewhere in the HTTP requests middleware:
local latency = math.random(1, 10)
http_requests_latency_hist:observe(latency)

Summary

Summary aggregates observed values into configurable quantiles. Summaries are useful as a service level indicator (e.g. SLAs, SLOs). Summary is exposed as multiple numerical values:

the total count of observed events
the total sum of observed values
number of observed events per quantile

local metrics = require('metrics')

-- create a summary with a sliding window of 5 age buckets and 60s bucket lifetime
local http_requests_latency = metrics.summary(
    'http_requests_latency', 'HTTP requests total',
    {[0.5]=0.01, [0.9]=0.01, [0.99]=0.01},
    {max_age_time = 60, age_buckets_count = 5}
)

-- somewhere in the HTTP requests middleware:
local latency = math.random(1, 10)
http_requests_latency:observe(latency)

Instance health check

In production environments Tarantool Cluster usually has a large number of so called "routers", Tarantool instances that handle input load and it is required to evenly distribute the load. Various load-balancers are used for this, but any load-balancer have to know which "routers" are ready to accept the load at that very moment. Metrics library has a special plugin that creates an http handler that can be used by the load-balancer to check the current state of any Tarantool instance. If the instance is ready to accept the load, it will return a response with a 200 status code, if not, with a 500 status code.

Next steps

See:

A more detailed getting started guide
Metrics API reference
Detailed information on plugins

Contribution

Feel free to send Pull Requests. To increase the chance of having your pull request accepted, make sure it follows these guidelines:

Title and description matches the implementation.
Code follows styleguide.
The pull request closes one or more of related issues. If not, please add an issue first.
The pull request contains necessary tests that verify the intended behavior.
The pull request contains a CHANGELOG note and documentation update if needed.

Your pull request will be reviewed in 3-5 days.

Contacts

If you have questions, please ask it on StackOverflow or contact us in Telegram:

Russian-speaking chat
English-speaking chat

Credits

We would like to thank Prometheus for a great API that we brusquely borrowed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Metrics

Table of contents

Installation

Plugins export

Metric types

Counter

Gauge

Histogram

Summary

Instance health check

Next steps

Contribution

Contacts

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

Metrics

Table of contents

Installation

Plugins export

Metric types

Counter

Gauge

Histogram

Summary

Instance health check

Next steps

Contribution

Contacts

Credits