Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop invalid array elements in OTLP payload #342

Closed
wants to merge 3 commits into from

Conversation

rubvs
Copy link
Contributor

@rubvs rubvs commented Aug 13, 2024

Fixes elastic/apm-server#13703

Context

The issue occurs due to an invalid OTLP payload. Specifically, an array containing heterogeneous elements.

APM Server Setup

  1. Apply this PR apm-data changes to your apm-server package:
cd apm-server
go mod edit -replace=github.com/elastic/apm-data=../apm-data
  1. Spin up your local apm-server instance

Example Setup

A Python example had to be used, since the OpenTelemetry Go SDK does not natively support heterogeneous arrays.

  • Pull the OTEL Python package that includes a set of examples
git clone https://github.com/open-telemetry/opentelemetry-python
cd opentelemetry-python
  • Save the following patch in changes.diff
diff --git a/docs/examples/basic_tracer/basic_trace.py b/docs/examples/basic_tracer/basic_trace.py
index bb1e341a..71e84224 100644
--- a/docs/examples/basic_tracer/basic_trace.py
+++ b/docs/examples/basic_tracer/basic_trace.py
@@ -13,16 +13,16 @@
 # limitations under the License.
 
 from opentelemetry import trace
+from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
 from opentelemetry.sdk.trace import TracerProvider
 from opentelemetry.sdk.trace.export import (
     BatchSpanProcessor,
-    ConsoleSpanExporter,
 )
 
 trace.set_tracer_provider(TracerProvider())
 trace.get_tracer_provider().add_span_processor(
-    BatchSpanProcessor(ConsoleSpanExporter())
+    BatchSpanProcessor(OTLPSpanExporter("http://localhost:8200/v1/traces"))
 )
 tracer = trace.get_tracer(__name__)
-with tracer.start_as_current_span("foo"):
+with tracer.start_as_current_span("foo", attributes={"foo": ["bar",None]}) as span:
     print("Hello world!")
diff --git a/exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/__init__.py b/exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/__init__.py
index 6593d89f..75d15b52 100644
--- a/exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/__init__.py
+++ b/exporter/opentelemetry-exporter-otlp-proto-common/src/opentelemetry/exporter/otlp/proto/common/_internal/__init__.py
@@ -85,6 +85,8 @@ def _encode_value(value: Any) -> PB2AnyValue:
                 values=[_encode_key_value(str(k), v) for k, v in value.items()]
             )
         )
+    elif value is None:
+        return PB2AnyValue()
     raise Exception(f"Invalid type {type(value)} of value {value}")
  • Apply the patch
patch < changes.diff
  • Create a Python venv
python -m venv env
pip install --upgrade pip
  • Install the opentelemetry-exporter-otlp* packages locally from opentelemetry-python/exporter folder. For example,
cd exporter/opentelemetry-exporter-otlp-proto-grpc
pip install -e .
  • Run the example
cd docs/examples/basic_tracer
python basic_trace.py
  • Result
> Hello world!
  • Running the Python example without the Patch
panic: interface conversion: interface {} is nil, not string
goroutine 3684 [running]:
github.com/elastic/apm-data/input/otlp.setLabel({0xc00502e960, 0x1b}, 0xc003ccef20, {0x1c9da00?, 0xc000ca7710?})
        github.com/elastic/apm-data@v1.1.0/input/otlp/metadata.go:508 +0x5f1
github.com/elastic/apm-data/input/otlp.TranslateTransaction.func1({0xc004fcc600, 0x1b}, {0xc0012a8e50?, 0xc0050564dc?})
        github.com/elastic/apm-data@v1.1.0/input/otlp/traces.go:290 +0x8be
go.opentelemetry.io/collector/pdata/pcommon.Map.Range({0xc0041ebaa0?, 0xc0050564dc?}, 0xc000b7a978)
        go.opentelemetry.io/collector/pdata@v1.5.0/pcommon/map.go:222 +0x97

Notes

  1. I do not see any proper way to log a message or propagate an error.
  2. Therefore, deductively we are only left with dropping invalid array elements.
  3. I prefer not to use reflect, but please comment if you think using reflect will be better and why.
  4. Any ideas on Unit Testing?

@rubvs rubvs requested review from simitt, carsonip and 1pkg August 13, 2024 18:24
@obltmachine obltmachine added the safe-to-test Changes are safe to run in the CI label Aug 13, 2024
carsonip
carsonip previously approved these changes Aug 14, 2024
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation in this PR will essentially leave the array element corresponding to invalid element as zero value (i.e. "" or 0.0). I'd prefer actually dropping the element such that the resulting array is smaller, but I don't have a strong opinion as long as we avoid the panic.

@carsonip
Copy link
Member

I do not see any proper way to log a message or propagate an error.
Therefore, deductively we are only left with dropping invalid array elements.

I'm fine with not propagating an error / logging, as this is such an edge case on invalid input that is not spec-compliant.

I prefer not to use reflect, but please comment if you think using reflect will be better and why.

There is no need to use reflect imo.

Any ideas on Unit Testing?

I believe this is not possible to craft a problematic payload with Go otel sdk. But we can add unit test for setLabel function in this instance, although we don't usually unit test every unexported function.

@carsonip carsonip self-requested a review August 14, 2024 13:35
@rubvs
Copy link
Contributor Author

rubvs commented Aug 14, 2024

@carsonip I'll skip the unit test, since I'll have to change package otlp_test to package otlp otherwise.

@rubvs rubvs force-pushed the fix-invalid-otlp-payload-panic branch from 05b775c to d03a251 Compare August 14, 2024 22:10
@rubvs rubvs marked this pull request as ready for review August 14, 2024 22:10
@rubvs rubvs requested a review from a team as a code owner August 14, 2024 22:10
input/otlp/metadata.go Outdated Show resolved Hide resolved
input/otlp/metadata.go Outdated Show resolved Hide resolved
Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a way to reproduce/test this in Go, not sure how I missed this the last time I tried.

func TestTranslateSpan(t *testing.T) {
	m := pcommon.NewMap()
	s := m.PutEmptySlice("f")
	s.FromRaw([]any{"f", 1})
	e := &modelpb.APMEvent{}
	otlp.TranslateSpan(ptrace.SpanKindInternal, m, e)
}

Can you add a test based on this?

Comment on lines 541 to 555
// Ensure that the array is homogeneous
// Drop value that is not OTEL supported: https://opentelemetry.io/docs/specs/otel/common/
if r, ok := v[i].(string); ok {
value[i] = r
}
}
modelpb.Labels(event.Labels).SetSlice(key, value)
case float64:
value := make([]float64, len(v))
for i := range v {
value[i] = v[i].(float64)
// Ensure that the array is homogeneous
// Drop value that is not OTEL supported: https://opentelemetry.io/docs/specs/otel/common/
if r, ok := v[i].(float64); ok {
value[i] = r
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test please?

Copy link
Member

@carsonip carsonip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding the test

input/otlp/metadata_test.go Show resolved Hide resolved

testCases := []struct {
desc string
input []any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input []any
input func() pcommon.Map

Would it be possible to consolidate TestTranslateSpanValue into this test by having input as a function that returns a map?

testCases := []struct {
desc string
input []any
expected int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to assert both expected label value and expected numeric label value instead of just checking the length. There will be minimal increase to the line of code, but it will provide real value in checking if the elements are really correct. Also, if the test really fails, the assertion output will be less useful when it just compares length.

assert.Equal(t, float64(1), e.GetNumericLabels()["f"].Value)
}

func TestTranslateSpanSlice(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I look at this test without the context that this actually tests setLabel, I'd wonder why we test TranslateSpan only, but not TranslateTransaction. I think it makes sense to add a dimension to this table driven test so that it tests both TranslateSpan and TranslateTransaction. Span events also use setLabel, but convertSpanEvent is slightly complex, so I'm fine with leaving that out unless others feel strongly about it.

@rubvs
Copy link
Contributor Author

rubvs commented Aug 28, 2024

Superseded by #346

@rubvs rubvs closed this Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe-to-test Changes are safe to run in the CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

otlp.setLabel panic on non-compliant attribute array values
5 participants