Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error label & SpanKind #173

Open
fzakaria opened this issue May 14, 2020 · 10 comments
Open

Error label & SpanKind #173

fzakaria opened this issue May 14, 2020 · 10 comments

Comments

@fzakaria
Copy link

fzakaria commented May 14, 2020

I'm seeking guidance on the best way to find associated mappings for SpanKind & errors.

SpanKind

The current codebase has the following snippet:

// Add Kind as a tag for now since there is no structured way of sending it with Stackdriver
// Trace API V2
if (zipkinSpan.kind() != null) {
  attributes.putAttributeMap(kKindLabelKey, toAttributeValue(kindLabel(zipkinSpan.kind())));
}

According to the documentation there seems to be structured field for reporting the kind.
https://cloud.google.com/trace/docs/reference/v2/rest/v2/SpanKind

Does this matter for the purpose of the CloudTrace UI?

Error

I believe there to be a semantic gap between Zipkin's, specifically Brave's, concept of an error tag and the corresponding error value in CloudTrace.

It is conventional in Zipkin to add a possibly empty "error" tag when a span failed. In Brave the backfilling of this tag with an Exception type name (or message) is deferred until it is known to be Zipkin2 format.

CloudTrace according to their documentation supports the following:

/error/message: An error message.
/error/name: Display name for the error.
(https://cloud.google.com/trace/docs/trace-labels)

StackTrace
https://cloud.google.com/trace/docs/reference/v2/rest/v2/StackTrace

It would be great if the AttributesExtractor could translate the error tag accordingly.

From my investigation of the code, it seems like the Span translation happens from Zipkin2 format rather than brave's Span which may unfortunate since the Throwable information is then lost.
(Would possibly allow creation of the StackTrace attribute)

@elefeint
Copy link
Contributor

elefeint commented Oct 6, 2020

Sorry we missed this, @fzakaria. Have you been able to get error messages into Trace?

@fzakaria
Copy link
Author

fzakaria commented Oct 6, 2020

I don't think so although I've lost context on it.
I've been very happy over all with the GCP sender so this was more of a cherry on top.

@elefeint
Copy link
Contributor

elefeint commented Oct 6, 2020

Thank you!

@elefeint elefeint closed this as completed Oct 6, 2020
@fzakaria
Copy link
Author

fzakaria commented Oct 6, 2020

for future readers I still believe it to be a gap using GCPSender but achieving 100% is likely out of scope for this as a compatability layer, that case better suited with using a library more native to CloudTrace

@elefeint
Copy link
Contributor

elefeint commented Oct 6, 2020

That's fair -- if this requirement comes up again, this issue should be reopened.

@codefromthecrypt
Copy link
Member

FWIW the throwable isn't lost in Brave. It is just that this implementation loses it. There's a SpanHandler in brave made specifically for the case of parsing throwables.

That no one yet has implemented SpanHandler here is a side-effect of a volunteer-based project. No one is actively doing things unless they feel like it. That doesn't mean it can't be done, but I agree "sender" is not the way out.

I'm re-opening as this issue isn't solved and the data model is a generic, not a personal requirement.

@codefromthecrypt
Copy link
Member

there are two issues here, btw.

The SpanKind maps exactly to Brave, this can be fixed
The error labels do not indicate they must be a Throwable, so this is a bit misclassified.

The service is not just java anyway, and literally GRPC examples are used. I recommend simply falling back to setting both to the same value as "error" when non-empty and other tags don't exist https://cloud.google.com/trace/docs/trace-labels

Later, when a SpanHandler is implemented, a Throwable can be considered when parsing (along with the "error" tag)
http://zipkin.io/brave/5.12.6/brave/brave/handler/SpanHandler.html

@codefromthecrypt
Copy link
Member

Some other things that can be done..

based on the description of error, one of them can be the RPC status, which we have a field for

"rpc.error_code", eg "CANCELLED"
"error" the RPC error code if there is no exception

https://github.com/openzipkin/brave/tree/master/instrumentation/rpc#span-data-policy

Almost the same on http

"http.status_code" when the status is not success.
"error", when there is an exception or status is >=400

https://github.com/openzipkin/brave/tree/master/instrumentation/http#span-data-policy

Anyone free to fix this one? looks easy peasy

@fzakaria
Copy link
Author

fzakaria commented Oct 7, 2020

The missing compatibility for kind does seem trivial and maybe should be broken into a separate issue. It seems pretty straightforward.

I think you lost me a little with the error codes; I am a bit out of touch with the codebase as I am newly back from paternity leave.

I recommend simply falling back to setting both to the same value as "error" when non-empty and other tags don't exist

I think you mean:
"I recommend simply falling back and to setting /error/message and /error/name to the Span's error value when non-empty.

If so, that seems like a straightforward change for the attribute extractor as well.

based on the description of error, one of them can be the RPC status, which we have a field for

The values look to be the same for "error" and either "http.status_code" & "rpc.error_code". I'm not clear on your recommendation that one of the CloudTrace tags (/error/message or /error/name) can that value.

@codefromthecrypt
Copy link
Member

Thanks @fzakaria. honestly the mapping isn't that tight it says literally:

/error/message | An error message. "Rendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE details = OS Error."
/error/name | Display name for the error.

So, my guess is that this is a long name (/error/message) short name (/error/name) thing until someone says otherwise.

In that case...

/error/message should be span.tags[error] if not empty
/error/name should be prioritize span.tags[rpc.error_code], span.tags[http.status_code], then span.tags[error] if not empty

sometimes having some rules is better than none, as at least it can get feedback and it is better than doing nothing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants