Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud output v2 #3117

Closed
13 of 17 tasks
codebien opened this issue Jun 6, 2023 · 1 comment
Closed
13 of 17 tasks

Cloud output v2 #3117

codebien opened this issue Jun 6, 2023 · 1 comment
Assignees
Milestone

Comments

@codebien
Copy link
Contributor

codebien commented Jun 6, 2023

Context

#2954 introduces the new experimental Coud output with a Protobuf-based protocol.

Memory usage

After the first iteration, the memory usage is higher than required. Especially for the Trend metrics is very easy to saturate the bandwidth in a range from tons of KiloBytes up to the remote limit (1 MB).

We also decided to denormalize some fields to reduce the workload and keep the implementation simple on the remote server but the load generated on the client is high, we should revisit this decision.

Fault tolerance

The current flush process could be more fault tolerant, it doesn't retry on failures.

Validation

__name__ and test_run_id are reserved labels for the remote service and if a test also sets them then there are conflicts generating unexpected behavior for the user. A more dev-friendly UX should be implemented.

Proposal

We identified some actions that should drive us to the goal:

  • A more compact Protobuf representation for Histogram.
  • Split in multiple requests when the flush process gets a number of time series higher than the MaxMetricSamplesPerPackage variable.
  • Normalize as MetricSet's fields the common fields across time series.
  • Fault-tolerant flush operation.
  • Exclude __name__ and test_run_id from the allowed tag names.

Acceptance criteria

Change the Cloud output default version to 2.

Worklog

Nice to have (in case we need to reduce the scope)

  1. codebien
  2. codebien
  3. cloud lower prio tests
    olegbespalov
  4. codebien
  5. cloud enhancement performance
@codebien
Copy link
Contributor Author

codebien commented Aug 8, 2023

Most of the critical work expected here has been merged and will be released on v0.46.0. Remaining work will continue on #3258

@codebien codebien closed this as completed Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants