Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SetErrorStatusOnException with async Exceptions works incorrectly #2753

Open
QBens opened this issue Dec 17, 2021 · 9 comments
Open

SetErrorStatusOnException with async Exceptions works incorrectly #2753

QBens opened this issue Dec 17, 2021 · 9 comments
Labels
bug Something isn't working

Comments

@QBens
Copy link

QBens commented Dec 17, 2021

Bug Report

List of nugets

  • OpenTelemetry 1.1.0
  • OpenTelemetry.Exporter.Console 1.1.0

Runtime version:

  • net6.0

Symptom

When SetErrorStatusOnException is enabled and exception occurred and was caught in nested async methed, the outer activities are incorrectly tagged ar Error.

What is the expected behavior?

When all exceptions were caught in async methods, outer activities should not be tagged ar Error.

What is the actual behavior?

Currently when exception was thrown and caught in async method, outer activities are tagged as Error.

Reproduce

Code to reproduce the problem:

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Trace;

public class Program
{
    private static readonly ActivitySource MyActivitySource = new ActivitySource(
        "MyCompany.MyProduct.MyLibrary");

    public static async Task Main()
    {
        using var tracerProvider = Sdk.CreateTracerProviderBuilder()
            .AddSource("MyCompany.MyProduct.MyLibrary")
            .SetSampler(new AlwaysOnSampler())
            .SetErrorStatusOnException()
            .AddConsoleExporter()
            .Build();

        try
        {
            using (var activity1 = MyActivitySource.StartActivity("Foo"))
            {
                using (var activity2 = MyActivitySource.StartActivity("Bar"))
                {
                    await Bar();
                    // no exception here
                }
            }
        }
        catch (Exception)
        {
            // swallow the exception
        }
    }

    private static async Task Bar()
    {
        await Task.Delay(100);
        try
        {
            await Bar2();
        }
        catch
        {
            // do nothing
        }
    }

    private static async Task Bar2()
    {
        await Task.Delay(100);
        throw new Exception("Oops!");
    }
}

Current result:

Activity.Id:          00-a74738702a69ac229573668ac075e01c-456b2d43151b21aa-01
Activity.ParentId:    00-a74738702a69ac229573668ac075e01c-bb28a8612b992338-01
Activity.ActivitySourceName: MyCompany.MyProduct.MyLibrary
Activity.DisplayName: Bar
Activity.Kind:        Internal
Activity.StartTime:   2021-12-17T10:18:34.8695981Z
Activity.Duration:    00:00:00.2853080
Activity.TagObjects:
    otel.status_code: ERROR
Resource associated with Activity:
    service.name: unknown_service:OpenTelemetryPoligon

Activity.Id:          00-a74738702a69ac229573668ac075e01c-bb28a8612b992338-01
Activity.ActivitySourceName: MyCompany.MyProduct.MyLibrary
Activity.DisplayName: Foo
Activity.Kind:        Internal
Activity.StartTime:   2021-12-17T10:18:34.8683046Z
Activity.Duration:    00:00:00.3186132
Activity.TagObjects:
    otel.status_code: ERROR
Resource associated with Activity:
    service.name: unknown_service:OpenTelemetryPoligon

Both activites are tagged with otel.status_code: ERROR

@QBens QBens added the bug Something isn't working label Dec 17, 2021
@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 26, 2022
@cijothomas cijothomas removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Jan 27, 2022
@nathansoz
Copy link

I also hit this today. Are there eyes on this?

@nathansoz
Copy link

Bump

@TimothyMothra
Copy link
Contributor

I spent some time looking at this today and was not able to reproduce the issue with the provided example.

I tested a few other combinations inside nested activities:

  • exception - both activities have tag.
  • exception within try/catch - no tag.
  • async exception - both activities have tag.
  • async exception within try/catch (provided example) - no tag.

I went looking for the specification on "otel.status_code" and found the definition of "ERROR":
https://github.com/open-telemetry/opentelemetry-specification/blob/main/semantic_conventions/trace/exporter/exporter.yaml

id: error
value: ERROR
brief: 'The operation contains an error.'

Based on this, I think the current version of the SDK is working as intended.
That is, if the inner activity contains an error, the outer activity also contains an error.

@BenHewins
Copy link

The reproduction in the original issue worked for me. Here's a variant of it:

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Trace;

public class Program
{
    private static readonly ActivitySource MyActivitySource = new ActivitySource(
        "MyCompany.MyProduct.MyLibrary");

    public static async Task Main()
    {
        using var tracerProvider = Sdk.CreateTracerProviderBuilder()
            .AddSource("MyCompany.MyProduct.MyLibrary")
            .SetSampler(new AlwaysOnSampler())
            .SetErrorStatusOnException()
            .AddConsoleExporter()
            .Build();


        using (var activity1 = MyActivitySource.StartActivity("OutsideBar"))
        {
           
                await Bar();
                // no exception here
        }

    }

    private static async Task Bar()
    {
        await Task.Delay(100);
        try
        {
            using (var activity2 = MyActivitySource.StartActivity("InsideBar"))
            {
                await Bar2();
            }
        }
        catch
        {
            // do nothing
        }
    }

    private static async Task Bar2()
    {
        await Task.Delay(100);
        throw new Exception("Oops!");
    }
}

Gives me these outputs:

Activity.TraceId:          0155f5a9391759baeb40766f955631a5
Activity.SpanId:           b8a0ee2d794ad634
Activity.TraceFlags:           Recorded
Activity.ParentSpanId:    04f4e22a7159253a
Activity.ActivitySourceName: MyCompany.MyProduct.MyLibrary
Activity.DisplayName: InsideBar
Activity.Kind:        Internal
Activity.StartTime:   2023-01-30T15:47:04.4927410Z
Activity.Duration:    00:00:00.1042580
Activity.Tags:
   StatusCode : ERROR
Resource associated with Activity:
    service.name: unknown_service:otelrepro

Activity.TraceId:          0155f5a9391759baeb40766f955631a5
Activity.SpanId:           04f4e22a7159253a
Activity.TraceFlags:           Recorded
Activity.ActivitySourceName: MyCompany.MyProduct.MyLibrary
Activity.DisplayName: OutsideBar
Activity.Kind:        Internal
Activity.StartTime:   2023-01-30T15:47:04.3857490Z
Activity.Duration:    00:00:00.2627850
Activity.Tags:
   StatusCode : ERROR
Resource associated with Activity:
    service.name: unknown_service:otelrepro

I'd expect InsideBar to have an error status code, but not OutsideBar. If I remove all the async/awaits and make everything sync, then it works as expected (InsideBar has an error, OutsideBar doesn't)

dotnet 7.0 with these OTEL versions:

  <ItemGroup>
    <PackageReference Include="OpenTelemetry" Version="1.3.2" />
    <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.3.2" />
  </ItemGroup>

@olane
Copy link

olane commented Jul 17, 2023

Fairly sure I've just hit this problem in our code. Unfortunately this setting isn't useful to us without fixing this issue - if an exception is caught I don't want higher up activities being marked as errors.

Copy link
Contributor

This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Oct 10, 2024
@nathansoz
Copy link

Please do not close this. I believe it is still an issue.

@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Oct 11, 2024
@Ayush-Kaushik
Copy link

Ayush-Kaushik commented Jan 31, 2025

@BenHewins , I built upon the sample code you provided and added a Task.Delay of 1 millisecond in catch block to prevent all the outer activities from being marked as error state.

The idea is that error status is dependent on the outcome of last asynchronous call. When Task.Delay finishes successfully after exception has been caught, it sets the status to non-error state. It would be similar to any other asynchronous actions that happened after exception is caught (not just Task.Delay).

This is a temporary workaround but not a viable solution for larger code-bases because we'll need to exhaustively find all places where this scenario could occur.

using System.Diagnostics;
using OpenTelemetry;
using OpenTelemetry.Trace;

public class Program
{
    private static readonly ActivitySource MyActivitySource = new ActivitySource(
        "MyCompany.MyProduct.MyLibrary");

    public static async Task Main()
    {
        using var tracerProvider = Sdk.CreateTracerProviderBuilder()
            .AddSource("MyCompany.MyProduct.MyLibrary")
            .SetSampler(new AlwaysOnSampler())
            .SetErrorStatusOnException()
            .AddConsoleExporter()
            .Build();


        using (var activity1 = MyActivitySource.StartActivity("OutsideBar"))
        {

            await Bar();
            // no exception here
        }

    }

    private static async Task Bar()
    {
        await Task.Delay(100);
        try
        {
            using (var activity2 = MyActivitySource.StartActivity("InsideBar"))
            {
                await Bar2();
            }
        }
        catch
        {
            await Task.Delay(1); // successful completion of this task resets the status of the outer activitie 
        }
    }

    private static async Task Bar2()
    {
        await Task.Delay(100);
        throw new Exception("Oops!");
    }
}

Here is the output with outer activity not marked as error anymore:

Activity.TraceId:            7e309af36bbf1b8518c2f2ea83b15a78
Activity.SpanId:             19950155cdd248b5
Activity.TraceFlags:         Recorded
Activity.ParentSpanId:       a8147dc832edee1e
Activity.DisplayName:        InsideBar
Activity.Kind:               Internal
Activity.StartTime:          2025-01-31T14:15:32.4385177Z
Activity.Duration:           00:00:00.1515947
Activity.Tags:
StatusCode: Error
Instrumentation scope (ActivitySource):
    Name: MyCompany.MyProduct.MyLibrary
Resource associated with Activity:
    telemetry.sdk.name: opentelemetry
    telemetry.sdk.language: dotnet
    telemetry.sdk.version: 1.11.1
    service.name: unknown_service:ConsoleApp1

Activity.TraceId:            7e309af36bbf1b8518c2f2ea83b15a78
Activity.SpanId:             a8147dc832edee1e
Activity.TraceFlags:         Recorded
Activity.DisplayName:        OutsideBar
Activity.Kind:               Internal
Activity.StartTime:          2025-01-31T14:15:32.3214849Z
Activity.Duration:           00:00:00.3201113
Instrumentation scope (ActivitySource):
    Name: MyCompany.MyProduct.MyLibrary
Resource associated with Activity:
    telemetry.sdk.name: opentelemetry
    telemetry.sdk.language: dotnet
    telemetry.sdk.version: 1.11.1
    service.name: unknown_service:ConsoleApp1

I used .net8.0

<PackageReference Include="OpenTelemetry" Version="1.11.1" />
<PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.11.1" />

@BenHewins
Copy link

Wow nice workaround, that isn't at all how I would have expected the error identification to work! We worked around this by manually setting the status on the spans that we care about, had to do it twice though:

activity.SetStatus(Status.Ok);
activity.SetStatus(ActivityStatusCode.Ok);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants