Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to improve internal error logging #1146

Closed
TommyCpp opened this issue Jul 10, 2023 · 6 comments · Fixed by #2128
Closed

Proposal to improve internal error logging #1146

TommyCpp opened this issue Jul 10, 2023 · 6 comments · Fixed by #2128
Assignees
Labels
A-common Area:common issues that not related to specific pillar enhancement New feature or request priority:p3 Lowest priority issues and bugs. release:required-for-stable Must be resolved before GA release, or nice to have before GA.
Milestone

Comments

@TommyCpp
Copy link
Contributor

Presently, an internal logging mechanism is in place that leverages eprintln! to alert users of errors within the Opentelemetry pipeline. Users have the option to customize this logging by introducing their own error handler.

However, this existing approach exhibits a couple of issues:

  • Absence of granularity in internal logs, making it challenging for users to isolate valuable information.
  • Insufficient metadata, which makes error messages less informative for users.

To improve user-friendliness of our internal error handling, I am proposing the development of an enhanced internal logging tool with the following attributes:

  • Minimal reliance on opentelemetry. Given that opentelemetry is typically used for diagnosing problems in applications and other libraries, it's important that we avoid using its log pillar for error reporting.
  • Different levels for the internal logs. As telemetry should remain stable and never panic, the maximum error level should be capped at ERROR. Additionally, levels such as WARNING and DEBUG can be introduced.
  • Inclusion of metadata to aid users in issue identification and resolution. Key metadata, in my view, includes:
    • Pillars (traces vs metrics vs logs)
    • Components, which could differ across various pillars. For instance, trace should contain SpanProvider, SpanProcessor, SpanExporter.
  • To help user filter different level of messages. We should provide some kind of configuration to set the maximum log level. log creates' compile time filters is a good example and we can provide something similar

To implement these enhancements, I think we should convert the current Error enum into an Error structure that will include all the aforementioned metadata and level information. So something like

pub struct Error {
    level: LogLevel,
    pillar: Pillar,
    component: Component,
    error: String,
}
@TommyCpp TommyCpp added enhancement New feature or request A-common Area:common issues that not related to specific pillar labels Jul 10, 2023
@shaun-cox
Copy link
Contributor

It might be worth seeing if we can leverage the tokio tracing crate for this implementation. It already defines concepts of log levels and components (which it calls targets) and has a rich facility for filtering which logs go where. I realize there might appear to be a circular dependency between their crate and opentelemetry, but I suspect it might be possible to avoid if we (and they) have api vs. sdk crates and our sdk only depends on their api and vice versa.

@lalitb
Copy link
Member

lalitb commented Sep 17, 2024

One option could be extend the existing error_handler for different log levels and so rename it to log_handler.rs. The log level can be configured both at compile time and run-time. I took a dig at this as below:

main...lalitb:opentelemetry-rust:log-handler

So, the existing error handling code:

        if let Some(ref inner) = self.0.inner {
            match inner.lock() {
                Ok(mut locked) => f(&mut locked),
                Err(err) => global::handle_error(err),
            }
        }

to

        if let Some(ref inner) = self.0.inner {
            match inner.lock() {
                Ok(mut locked) => f(&mut locked),
                Err(err) => otel_log_error!(err, Pillar::Trace, Component::Context),
            }
        }

The user can add custom log handler, which can use log, tracing or (e)println!, or anything else.

The macro to log the INFO level log:

    #[macro_export]
    macro_rules! otel_log_info {
        ($message:expr, $pillar:expr, $component:expr) => {
            if is_compile_time_log_level_enabled!(LogLevel::Info) && is_runtime_log_level_enabled(LogLevel::Info) {
                handle_log(Error {
                    level: LogLevel::Info,
                    pillar: $pillar,
                    component: $component,
                    message: $message.to_string(),
                });
            }
        };
    }

@lalitb
Copy link
Member

lalitb commented Sep 18, 2024

Also, as per the @shaun-cox suggestion, we can also directly use the tracing crate for internal logging. The POC for the same :
https://github.com/open-telemetry/opentelemetry-rust/compare/main...lalitb:log-handler-using-tracing?expand=1

  • This will cause the infinite logging scenario, so we had to ignore all the internally generated logs at the tracing-appender level.
  • Updated the tracing-appender example to showcase it - with a separate layer to process the otel internal logs, while they are ignored by the tracing-appender layer.
  • The tracing macros are wrapped under a macros, and called only when the feature flag is enabled. The otel code will only invoke the wrapper, so it allows us to change the tracing crate with (say) log crate in future if we want so.

@cijothomas
Copy link
Member

cijothomas commented Sep 18, 2024

Also, as per the @shaun-cox suggestion, we can also directly use the tracing crate for internal logging. The POC for the same : https://github.com/open-telemetry/opentelemetry-rust/compare/main...lalitb:log-handler-using-tracing?expand=1

  • This will cause the infinite logging scenario, so we had to ignore all the internally generated logs at the tracing-appender level.
  • Updated the tracing-appender example to showcase it - with a separate layer to process the otel internal logs, while they are ignored by the tracing-appender layer.
  • The tracing macros are wrapped under a macros, and called only when the feature flag is enabled. The otel code will only invoke the wrapper, so it allows us to change the tracing crate with (say) log crate in future if we want so.

Thanks! Can you promote the branch to a PR, so we can start reviewing. I don't see any Circular dep issues, and the infinite logging issue looks trivial to fix either by Otel or by user. We need to develop some conventions about target and eventname that must be followed by OTel components.

The otel code will only invoke the wrapper, so it allows us to change the tracing crate with (say) log crate in future if we want so.

This is a good plan!

@lalitb
Copy link
Member

lalitb commented Sep 19, 2024

Can you promote the branch to a PR, so we can start reviewing.

Have added this - #2128 - for further discussion.

@cijothomas cijothomas modified the milestones: Logging SDK Stable, 0.27 Oct 2, 2024
@cijothomas
Copy link
Member

Moving this to 0.27 milestone. This milestone is expected to upgrade several components to RC status, and quality internal logs is a must have for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-common Area:common issues that not related to specific pillar enhancement New feature or request priority:p3 Lowest priority issues and bugs. release:required-for-stable Must be resolved before GA release, or nice to have before GA.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants