Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Redaction of Numeric Attribute Values in the OpenTelemetry Collector #36684

Open
Tiberius202 opened this issue Dec 4, 2024 · 10 comments
Open
Labels
enhancement New feature or request help wanted Extra attention is needed priority:p2 Medium processor/redaction Redaction processor

Comments

@Tiberius202
Copy link

Tiberius202 commented Dec 4, 2024

Component(s)

processor/redaction

What happened?

Description

The redaction processor does not redact attributes that are numbers.

For instance, consider the following example where two spans are sent for processing:

./otel-cli span -s my-service --attrs app.dummy="4111111111111"
./otel-cli span -s my-service --attrs app.dummy="VISA 4111111111111"

In this case, the redaction processor only redacts the second span's attribute value, leaving the first one untouched.

This is a follow up to this stale issue
#26348

Steps to Reproduce

  1. Set up the OpenTelemetry Collector with the provided configuration for the redaction processor.
  redaction:
    allow_all_keys: true
    blocked_values:
      - "4[0-9]{12}(?:[0-9]{3})?" ## Visa credit card number
  1. Generate spans with attributes containing numeric values, such as credit card numbers.
otel-cli span --endpoint 127.0.0.1:4317 --protocol grpc --insecure true -s my-service --attrs app.dummy="4111111111111"
otel-cli span --endpoint 127.0.0.1:4317 --protocol grpc --insecure true -s my-service --attrs app.dummy="VISA 4111111111111"

  1. Observe that the redaction regex does not properly match and redact numeric attribute values.
2024-12-04T15:26:25.089-0500    info    service@v0.115.0/service.go:166 Setting up own telemetry...
2024-12-04T15:26:25.092-0500    info    telemetry/metrics.go:70 Serving metrics {"address": "localhost:8888", "metrics level": "Normal"}
2024-12-04T15:26:25.096-0500    info    builders/builders.go:26 Development component. May change in the future.        {"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-12-04T15:26:25.099-0500    info    service@v0.115.0/service.go:238 Starting otelcol-contrib...     {"Version": "0.115.0", "NumCPU": 8}
2024-12-04T15:26:25.100-0500    info    extensions/extensions.go:39     Starting extensions...
2024-12-04T15:26:25.107-0500    warn    internal@v0.115.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.       {"kind": 
"receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-att
acks"}
2024-12-04T15:26:25.107-0500    info    otlpreceiver@v0.115.0/otlp.go:112       Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2024-12-04T15:26:25.107-0500    info    service@v0.115.0/service.go:261 Everything is ready. Begin running and processing data.
2024-12-04T15:26:30.121-0500    info    Traces  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}
2024-12-04T15:26:30.121-0500    info    todo-generate-default-span-names a4ba89b46bd31085a1485d43d41e78de 64826c6a258df964 app.dummy=4111111111111
        {"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-12-04T15:26:30.191-0500    info    Traces  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}
2024-12-04T15:26:30.191-0500    info    todo-generate-default-span-names 9f0f6a6039496915add1df61ae4d0341 230fa53949041e37 app.dummy=VISA ****
        {"kind": "exporter", "data_type": "traces", "name": "debug"}

Expected Result

The redaction processor should consistently match and redact attribute values that fit the specified regex patterns, including numeric values. The attribute app.dummy in the first span should be redacted as ****.

Actual Result

Currently, the redaction processor does not process numeric attribute values. The attribute app.dummy in the first span is revealed as 4111111111111.

Collector version

v0.115.1

Environment information

Environment

OS: Darwin_amd64
Compiler(if manually compiled): N/A. Pulled from https://github.com/open-telemetry/opentelemetry-collector-releases/releases/tag/v0.115.1

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  redaction:
    allow_all_keys: true
    blocked_values:
      - "4[0-9]{12}(?:[0-9]{3})?" ## Visa credit card number

exporters:
  debug:
    verbosity: normal
  otlp:
    endpoint: 0.0.0.0:18443
    tls:
      insecure: true
    compression: none
    sending_queue:
      queue_size: 50

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [redaction]
      exporters: [otlp, debug]

Log output

2024-12-04T15:26:25.089-0500    info    service@v0.115.0/service.go:166 Setting up own telemetry...
2024-12-04T15:26:25.092-0500    info    telemetry/metrics.go:70 Serving metrics {"address": "localhost:8888", "metrics level": "Normal"}
2024-12-04T15:26:25.096-0500    info    builders/builders.go:26 Development component. May change in the future.        {"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-12-04T15:26:25.099-0500    info    service@v0.115.0/service.go:238 Starting otelcol-contrib...     {"Version": "0.115.0", "NumCPU": 8}
2024-12-04T15:26:25.100-0500    info    extensions/extensions.go:39     Starting extensions...
2024-12-04T15:26:25.107-0500    warn    internal@v0.115.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks.       {"kind": 
"receiver", "name": "otlp", "data_type": "traces", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-att
acks"}
2024-12-04T15:26:25.107-0500    info    otlpreceiver@v0.115.0/otlp.go:112       Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "0.0.0.0:4317"}
2024-12-04T15:26:25.107-0500    info    service@v0.115.0/service.go:261 Everything is ready. Begin running and processing data.
2024-12-04T15:26:30.121-0500    info    Traces  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}
2024-12-04T15:26:30.121-0500    info    todo-generate-default-span-names a4ba89b46bd31085a1485d43d41e78de 64826c6a258df964 app.dummy=4111111111111
        {"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-12-04T15:26:30.191-0500    info    Traces  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}
2024-12-04T15:26:30.191-0500    info    todo-generate-default-span-names 9f0f6a6039496915add1df61ae4d0341 230fa53949041e37 app.dummy=VISA ****
        {"kind": "exporter", "data_type": "traces", "name": "debug"}

Additional context

#26348

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/redactionprocessor/processor.go#L190

This line uses the Str, but perhaps should use AsString in order to detect when a number should be redacted. There was a worry that this would change the type of the data, but this could be mitagited by replacing the number with 0.

@Tiberius202 Tiberius202 added bug Something isn't working needs triage New item requiring triage labels Dec 4, 2024
@github-actions github-actions bot added the processor/redaction Redaction processor label Dec 4, 2024
Copy link
Contributor

github-actions bot commented Dec 4, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@mx-psi
Copy link
Member

mx-psi commented Dec 5, 2024

Similar previous issue: #26348. We ended up implementing #27867 to support #26348 (comment). I think we can close this issue after documenting this.

@mx-psi mx-psi added documentation Improvements or additions to documentation priority:p2 Medium and removed bug Something isn't working needs triage New item requiring triage labels Dec 5, 2024
@Tiberius202
Copy link
Author

I don't think transform processor fits our use case. We want to perform redaction on all key-value entries in the attributes map. While we could transform all values into strings, it would be better to only transform values that fit the regex.
In order to use transform processor, we would need something like: set(attributes[.], "***") where IsInt(value) and value > 10000.
Can we get around the type issue by making default redacted values for other types? Something like 0 for ints

@TylerHelmuth
Copy link
Member

Until we solve the looping problem OTTL has no way to convert all attribute values to a new type without introducing a new function. If the attribute keys whose values need changed are not known then String() can't be used. We could add a new function that can handle taking in a map and converting all the values, but then we'd end up changing values that don't need their type changed.

@Tiberius202
Copy link
Author

@mx-psi sorry for the ping, but you seem knowledgeable on this.
Since OTTL has no way to support this security flaw, can we move forward with redacting integers and replacing them with another value (probably either 0 or negative 1). If unexpected behavior is an issue, we could provide a flag in the config file with something like "redactints". I am willing to work on a PR for this as it shouldn't be too difficult by leveraging AsString()

@mx-psi
Copy link
Member

mx-psi commented Dec 17, 2024

@Tiberius202 Before we do that, I have three comments.

First I still don't understand what is a valid use case where you would want to redact a number. Passing a credit card as a number feels like a bad idea (see here for an argument) and I can't think of any situation where you would want to pass sensitive data as a number.

Second, related to the above, I feel like there is a bug in the otel-cli tool you are using (or at least a point of major confusion), in that it is being loose with the typing of attributes.

Third and last: OTTL does support changing a number to 0/-1. If you want to use it for that, you can. The issue Tyler is talking about concerns type conversions, which wouldn't happen in your case.

@Tiberius202
Copy link
Author

Our application is hoping to never pass sensitive data. Our goal is to prevent dataleaks from misconfigured applications as well as possible hackers to promote defense in depth.
To your first comment, it seems like an easy enough mistake or possible vector of attack to remove the spaces from credit card numbers and send them as a number. The otel-cli is just one example of where this naive casting occurs. Telemetrygen also has the same casting behavior. I believe that the OTEL_RESOURCE_ATTRIBUTES environment variable has the naive casting, but I have had some trouble testing it (will get back to you with results).
While ideally all of these sources would have mandatory type information required just like the language APIs, changing all of them and coming up with a CLI friendly way to include type information may be challenging. Not to mention requiring multiple breaking change PR's.
OTTL does not fit our use case as we need to know and transform all keys and do not know them all when we ship the collector. We are using the "allow_all_keys" flag, so we want redaction that works on all keys. A single collector config may receive traces from many apps from many developers and coming up with a comprehensive list of every possible key is impossible.
If we had complete control over our app, we would simply not send sensitive data in the first place. I really appreciate the insight you have provided and hope you can understand our use case.

@TylerHelmuth
Copy link
Member

Until the looping problem is solved in OTTL it won't be able to handle this use case unless we add a function to do the conditional casting and redaction and I don't think we should do that since we've decided the redactionprocess is an important enough processor to be separate from the transformprocessor.

I think we should move forward with this idea with these requirements:

  • the feature is optional and off by default
  • while all values will be casted to a string to do the pattern check, only values that match should actually save the change/redaction.
  • We document that this feature causes the processor to be slightly less performant. (or if benchmarks show it is negligible we can omit this warning)

@TylerHelmuth TylerHelmuth removed the documentation Improvements or additions to documentation label Jan 10, 2025
@mx-psi mx-psi added enhancement New feature or request help wanted Extra attention is needed labels Jan 10, 2025
@mx-psi
Copy link
Member

mx-psi commented Jan 10, 2025

Agreed with Tyler, we just discussed this offline. Would also really appreciate it if you are willing to report the 'type confusion' usability issues to otel-cli, @Tiberius202

Telemetrygen also has the same casting behavior

I'll point out that I don't think this is the case, telemetrygen only supports string and boolean valued attributes for now.

I believe that the OTEL_RESOURCE_ATTRIBUTES environment variable has the naive casting, but I have had some trouble testing it (will get back to you with results).

Not needed to solve this issue now but I would be interested in the results.

Tiberius202 pushed a commit to Tiberius202/opentelemetry-collector-contrib that referenced this issue Jan 13, 2025
The redaction is done based on its AsString value
rather than SDR. This is based on this issue:
open-telemetry#36684
@Tiberius202
Copy link
Author

I made a draft PR with the change to use AsString() instead of Str(). This means the setting will redact types other than Ints if their AsString representation matches the Regex. Is this acceptable / Intended behavior?

Also, I was wondering if you have any guide on how to do performance testing for this library?
Tiberius202#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed priority:p2 Medium processor/redaction Redaction processor
Projects
None yet
Development

No branches or pull requests

3 participants