-
Notifications
You must be signed in to change notification settings - Fork 0
Roadmap
- host logs
- file based logs through a managed sidecar container
- logs sent to a network/otlp endpoint directly
- kubernetes event log
Currently only one collector can manage a tenant which we enforce through the tenant status. We want to allow however multiple different external or internal sources to implement the same tenancy rules. The idea to implement it is to dedicate the current Controller resource to the Kubernetes log collection use case and introduce separate CRDs for use cases such as receiving telemetry from external sources (where we process not just logs but metrics and traces as well). Even for the Kubernetes collector there is a use case we can think about where the one to many relationship implemented currently is too limited, because we would need multiple connector to be able to implement the global tenant configuration. (the use case is the multiple isolated node groups with a single global infra tenant)
Currently the receiver configuration is tuned to support containerd only.
Buffering and retry helper: https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md
Bindplane example: https://github.com/search?q=repo%3AobservIQ%2Fbindplane-agent+sending_queue&type=code
Please note that the collector is yet to fully support a graceful shutdown:
Collector does not yet support graceful shutdown but we plan to add it.
Qs:
- buffer metrics?
- PVCs (or any alternative) with daemonsets: https://kubernetes.io/docs/concepts/storage/volumes/#local
We lack a complete solution for collecting byte metrics, although we plan to use the count connector already. There is another approach that doesn't involve duplicating logs which is implemented in bindplane: https://github.com/observIQ/bindplane-agent/tree/release/v1.43.0/processor/metricextractprocessor
We have to keep considering both approaches until we can have a good measurement.
(Discussion ftr https://axoflow.slack.com/archives/C04HPRT4JH3/p1705938608863419)
Qs
- understand how opamp provides as of metrics
When we deal with lots of outputs, one slow output can fill up the queues. If queues are limited there will be backpressure. If there is backpressure the source will stop. The idea here is to use separate receivers per tenant, but this need to be verified.
We could possibly optimize for the case when subscriptions have lots of overlap in their labelselector, thus might be sending the same data multiple times to the same destination. Instead of using a routerconnector for subscriptions we could possibly use a single pipeline to add all the subscriptions as subsequent processors and then use a routerconnector for the messages already labeled with the subscription id to route them to the right output.
Look at how hot reload could improve the configuration update flow.
Go with the simplest possible solution.
Existing alternatives currently (and possible improvement ideas)
- silly config check is available by default
- there is an option in the collector for syntax check, not implemented for the operator
- implementing a full config check by running an isolated job (probably not needed for our scenario, more for an aggregator where custom configs are applied by the user)
See the following issue: https://github.com/open-telemetry/opentelemetry-collector/issues/4205