cloud: Binary-based ingestion #2954

codebien · 2023-03-06T10:44:59Z

What

The current Cloud ingestion service receives the metrics on the CLOUD_URL/v1/metrics/<TEST_REF_ID> endpoint. Each HTTP request contains a JSON payload with an array of Sample at the root and each Sample contains a data field where the type is one of the types defined below.

We want replace it implementing a new HTTP body format using a binary encoding.

Current JSON payload

[
  {
    "type": "<TYPE>",
    "metric": "<NAME>",
    "data": {
      ...
    },
  },{
  ...
  }
]

Single point

Show me the JSON

{
"type": "Point",
"metric": "vus",
"data": {
  "time": "%d",
  "type": "gauge",
  "tags": {
    "aaa": "bbb",
    "ccc": "123"
  },
  "value": 999
}
}

Multi points

Show me the JSON

{
"type": "Points",
"metric": "iter_li_all",
"data": {
  "time": "%d",
  "type": "counter",
  "tags": {
    "test": "mest"
  },
  "values": {
    "data_received": 6789.1,
    "data_sent": 1234.5,
    "iteration_duration": 10000
  }
}
}

Aggregated points

Show me the JSON

{
"type": "AggregatedPoints",
"metric": "http_req_li_all",
"data": {
  "time": "%d",
  "type": "aggregated_trend",
  "count": 2,
  "tags": {
    "test": "mest"
  },
  "values": {
    "http_req_duration": {
      "min": 0.013,
      "max": 0.123,
      "avg": 0.068
    },
    "http_req_blocked": {
      "min": 0.001,
      "max": 0.003,
      "avg": 0.002
    },
    "http_req_connecting": {
      "min": 0.001,
      "max": 0.002,
      "avg": 0.0015
    },
    "http_req_tls_handshaking": {
      "min": 0.003,
      "max": 0.004,
      "avg": 0.0035
    },
    "http_req_sending": {
      "min": 0.004,
      "max": 0.005,
      "avg": 0.0045
    },
    "http_req_waiting": {
      "min": 0.005,
      "max": 0.008,
      "avg": 0.0065
    },
    "http_req_receiving": {
      "min": 0.006,
      "max": 0.008,
      "avg": 0.007
    }
  }
}
}

Why

It is required to have better efficiency at scale. An encoding binary format would reduce the size of the payload and the hardware requirements for encoding/decoding operations both on cloud and on clients.

Non-Goals

Aggregation algorithm for reducing the dataset of flushed data (e.g. aggregation).

How / Proposals

Create a new Cloud output (v2) that flushes metrics creating HTTP requests using the Protobuf mechanism for serializing the body.

In summary an example of the HTTP request:

POST CLOUD_URL/v2/metrics/<TEST_REF_ID> HTTP/1.1
Host: www.example.com
User-Agent: k6
Content-Type: application/x-protobuf
Content-Encoding: snappy
K6-Metrics-Protocol-Version: 2.0

To stay closer to the Prometheus implementation, the output has to compresss using the Snappy algorithm.

The code below contains a Protobuf proposal inspired by OpenMetrics to use for encoding the body request:

Show me the Proto file

EDIT: The Protobuf after several iterations, https://github.com/grafana/k6/blob/0cddc417243fd152f0a2e532b1870fa6d8635d03/output/cloud/expv2/pbcloud/metric.proto

TODO: Use a HDR histogram implementation for mapping the Trend type.

It is a requiment to add the name and the test run id as part of the tag set. The output has to add:

metrics.<metric>.tags["__name__"] = "<metric-name>"
metrics.<metric>.tags["test_run_id"] = "<test-ref-id>"

Additional implementation details

Encapsulate the new Cloud output startup from the current Cloud output based on a config option logic. In this way, we can overwrite at runtime the used output and fallback on the previous logic in case it is required.

Action Plan

Quick and dirty implementation of a basic Cloud output v2
- Config option for using v2 and the relative logic in v1
- Ability to flush metric samples encoded as defined by the new protocol
- No Trend implementation
Trend implementation as HDR
Reiterate for polish and stability

Future

Better metrics aggregation (e.g. Cloud Aggregation for Counter, Gauge and Rate #1700)
Consider a full Prometheus Remote-write implementation
Origin (e.g Builtin or Custom) of the metrics (at the moment the cloud backend has a fixed list of the Builtin metrics).

Open Questions

~~HDR format in protobuf~~ Done

Work log

It collects all the tasks required for the new cloud output. The new cloud outputs include a consistent refactor, a new binary format for the metrics requests' payload and samples aggregation and HDR Histogram generation on the client.

It depends on the following PRs as a prerequisite:

cloud/v1: Aggregation in a dedicated pkg #3063
output/cloud: Versioning #3041 Support multiple versions of the output

The following PRs are expected to be merged to have the final working output:

cloud: New output v2 #3072 Experimental v2 Output foundations
output/cloudv2: Aggregation #3071 Samples aggregation
output/cloudv2: Binary-based payload #2963 Protobuf models and client
output/cloudv2: Error handling for flush #3082 Handle errors on flush
output/cloudv2: Flush the aggregated metrics #3083 Flushing of the aggregates
output/cloudv2: Trend as Histogram #3027 HDR Histogram
output/cloudv2: Optimized metric sinks #3085 Dedicated and optimized sinks
output/cloudv2: Use unix nano as bucket time #3098 Bucket time as unix nano

The text was updated successfully, but these errors were encountered:

codebien · 2023-06-08T08:42:50Z

Most of the work planned here has been merged, I will close it and continue on a new dedicated issue #3117 for the remaining performance optimizations.

codebien added enhancement cloud performance labels Mar 6, 2023

codebien self-assigned this Mar 6, 2023

codebien mentioned this issue Mar 8, 2023

output/cloudv2: Binary-based payload #2963

Merged

yorugac mentioned this issue May 3, 2023

Metrics aggregation review grafana/k6-operator#192

Closed

codebien mentioned this issue May 16, 2023

cloud: New output v2 #3072

Merged

codebien added this to the v0.45.0 milestone May 30, 2023

mstoykov mentioned this issue May 31, 2023

Cloud Aggregation for Counter, Gauge and Rate #1700

Closed

codebien mentioned this issue Jun 6, 2023

Cloud output v2 #3117

Closed

codebien closed this as completed Jun 8, 2023

codebien mentioned this issue Aug 8, 2023

cloud: Set v2 as the default output #3258

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud: Binary-based ingestion #2954

cloud: Binary-based ingestion #2954

codebien commented Mar 6, 2023 •

edited

Loading

codebien commented Jun 8, 2023

cloud: Binary-based ingestion #2954

cloud: Binary-based ingestion #2954

Comments

codebien commented Mar 6, 2023 • edited Loading

What

Current JSON payload

Single point

Multi points

Aggregated points

Why

Non-Goals

How / Proposals

Additional implementation details

Action Plan

Future

Open Questions

Work log

codebien commented Jun 8, 2023

codebien commented Mar 6, 2023 •

edited

Loading