Skip to content

Roadmap

Peter Wilcsinszky edited this page Feb 14, 2024 · 35 revisions

Support other sources

  • host logs
  • file based logs through a managed sidecar container
  • logs sent to a network/otlp endpoint directly
  • kubernetes event log

Currently only one collector can manage a tenant which we enforce through the tenant status. We want to allow however multiple different external or internal sources to implement the same tenancy rules. The idea to implement it is to dedicate the current Controller resource to the Kubernetes log collection use case and introduce separate CRDs for use cases such as receiving telemetry from external sources (where we process not just logs but metrics and traces as well). Even for the Kubernetes collector there is a use case we can think about where the one to many relationship implemented currently is too limited, because we would need multiple connector to be able to implement the global tenant configuration. (the use case is the multiple isolated node groups with a single global infra tenant)

Docker container runtime support

Currently the receiver configuration is tuned to support containerd only.

Persistent buffering and file position

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/filelogreceiver/README.md

Buffering and retry helper: https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md

Bindplane example: https://github.com/search?q=repo%3AobservIQ%2Fbindplane-agent+sending_queue&type=code

Please note that the collector is yet to fully support a graceful shutdown:

Collector does not yet support graceful shutdown but we plan to add it.

Qs:

Metrics

We lack a complete solution for collecting byte metrics, although we plan to use the count connector already. There is another approach that doesn't involve duplicating logs which is implemented in bindplane: https://github.com/observIQ/bindplane-agent/tree/release/v1.43.0/processor/metricextractprocessor

We have to keep considering both approaches until we can have a good measurement.

(Discussion ftr https://axoflow.slack.com/archives/C04HPRT4JH3/p1705938608863419)

Qs

  • understand how opamp provides as of metrics

Verify and fix the backpressure problem

When we deal with lots of outputs, one slow output can fill up the queues. If queues are limited there will be backpressure. If there is backpressure the source will stop. The idea here is to use separate receivers per tenant, but this need to be verified.

Optimization by merging subscriptions

We could possibly optimize for the case when subscriptions have lots of overlap in their labelselector, thus might be sending the same data multiple times to the same destination. Instead of using a routerconnector for subscriptions we could possibly use a single pipeline to add all the subscriptions as subsequent processors and then use a routerconnector for the messages already labeled with the subscription id to route them to the right output.

Hot reload

Look at how hot reload could improve the configuration update flow.

Configcheck

Go with the simplest possible solution.

Existing alternatives currently (and possible improvement ideas)

  • silly config check is available by default
  • there is an option in the collector for syntax check, not implemented for the operator
  • implementing a full config check by running an isolated job (probably not needed for our scenario, more for an aggregator where custom configs are applied by the user)

See the following issue: https://github.com/open-telemetry/opentelemetry-collector/issues/4205

Clone this wiki locally