-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/awsfirehose] Implement encoding extension framework #37113
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
If this direction is acceptable, I would be happy to polish and contribute the branch I referenced above. |
This sounds like a valid request to me. FWIW, there are many components that use custom unmarshalers/marshalers. Now that we have encoding extension, we should do this for all such components. |
/label waiting-for-code-owners |
Not sure for Firehose strictly, however the extension paths makes sense to me and embracing that adoption. |
This seems reasonable to me. I might have some concerns about the implementation, but I'll defer those until it's out of draft. |
Thanks for the feedback, I'll clean up the branch and open a PR soon. |
/label -waiting-for-code-owners |
@Aneurysm9 seeing as the receiver has alpha stability, there's technically an option to break config straight away. Would you prefer:
|
Regardless of stability level my preference is to not make unnecessary breaking changes and to do as much as possible to both ease the transition and provide as much notice and time for users to adapt as possible. I don't think there's any reason not to follow the standard breaking changes process here. |
@Aneurysm9 sounds good, thanks - that's my preference too. #37262 maintains backwards compatibility but deprecates the old config, so it's ready for review. |
…37361) #### Description Refactor unmarshallers to fit into the encoding framework. The internal unmarshallers now implement `plog.Unmarshaler` and `pmetric.Unmarshaler`. This will enable us to use encoding extensions in a followup, and will enable us to extract the unmarshallers later as encoding extensions. As a result of the interface change, the unmarshallers now unmarshal a single record at a time, which means we cannot merge resources/metrics as we go, and only within each record. This impacts performance, so to offset that we implement various optimisations: - Use json-iterator for decoding JSON - Use klauspost/compress for decompressing gzip - Pool gzip readers - Remove pointer type from cwMetricValue to avoid allocation - Don't read the whole request body into memory - Reuse buffer for decoding base64; decode as we go There are more optimisations we can make to reduce memory allocations, e.g. avoid reflection when decoding JSON. There's a fix for a subtle bug in the cwmetrics unmarshaller where the unit of a metric was not considered part of its identity, and so two metrics that differed only by unit would be merged. #### Link to tracking issue Preparation for #37113 #### Testing - Added tests for consuming requests containing multiple records. - Added benchmarks for the full request/response flow, for cwlogs and cwmetrics record types. There's an increase in memory usage for cwmetrics unmarshalling, and decrease for cwlogs unmarshalling. CPU usage is better across the board. ``` goos: linux goarch: amd64 pkg: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsfirehosereceiver cpu: AMD Ryzen 7 PRO 5850U with Radeon Graphics │ /tmp/old.txt │ /tmp/new.txt │ │ sec/op │ sec/op vs base │ LogsConsumer_cwlogs/10resources_10records_1logs-16 142.03µ ± 8% 75.85µ ± 16% -46.60% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_10records_10logs-16 413.1µ ± 5% 208.0µ ± 23% -49.65% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_100records_1logs-16 1435.8µ ± 8% 728.0µ ± 11% -49.30% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_100records_10logs-16 4.200m ± 2% 1.875m ± 4% -55.34% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_10records_1metrics-16 104.30µ ± 16% 70.26µ ± 12% -32.64% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_10records_10metrics-16 821.7µ ± 7% 505.5µ ± 5% -38.48% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_100records_1metrics-16 780.2µ ± 5% 647.0µ ± 8% -17.07% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_100records_10metrics-16 7.823m ± 5% 5.297m ± 11% -32.29% (p=0.002 n=6) geomean 809.9µ 475.7µ -41.26% │ /tmp/old.txt │ /tmp/new.txt │ │ B/op │ B/op vs base │ LogsConsumer_cwlogs/10resources_10records_1logs-16 437.78Ki ± 0% 61.70Ki ± 1% -85.91% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_10records_10logs-16 533.7Ki ± 0% 124.9Ki ± 2% -76.60% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_100records_1logs-16 4319.1Ki ± 0% 550.1Ki ± 0% -87.26% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_100records_10logs-16 5.167Mi ± 0% 1.172Mi ± 1% -77.31% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_10records_1metrics-16 47.88Ki ± 5% 83.71Ki ± 0% +74.84% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_10records_10metrics-16 390.0Ki ± 1% 415.4Ki ± 1% +6.51% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_100records_1metrics-16 358.9Ki ± 0% 772.9Ki ± 0% +115.37% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_100records_10metrics-16 3.730Mi ± 0% 3.962Mi ± 0% +6.23% (p=0.002 n=6) geomean 779.7Ki 391.8Ki -49.76% │ /tmp/old.txt │ /tmp/new.txt │ │ allocs/op │ allocs/op vs base │ LogsConsumer_cwlogs/10resources_10records_1logs-16 406.0 ± 2% 348.0 ± 2% -14.29% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_10records_10logs-16 2.025k ± 0% 1.971k ± 1% -2.64% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_100records_1logs-16 2.922k ± 0% 3.094k ± 0% +5.89% (p=0.002 n=6) LogsConsumer_cwlogs/10resources_100records_10logs-16 18.46k ± 0% 19.46k ± 1% +5.45% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_10records_1metrics-16 548.5 ± 8% 653.0 ± 0% +19.05% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_10records_10metrics-16 3.795k ± 3% 4.729k ± 1% +24.63% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_100records_1metrics-16 3.281k ± 0% 6.161k ± 0% +87.78% (p=0.002 n=6) MetricsConsumer_cwmetrics/10resources_100records_10metrics-16 28.02k ± 0% 46.90k ± 0% +67.39% (p=0.002 n=6) geomean 3.098k 3.722k +20.16% ``` #### Documentation N/A --------- Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Component(s)
receiver/awsfirehose
Is your feature request related to a problem? Please describe.
awsfirehosereceiver has a limited set of baked-in unmarshallers: cwlogs (CloudWatch logs), cwmetrics (CloudWatch metric streams JSON format), otlp_v1 (CloudWatch metric streams OTLP 1.0 format).
Firehose itself does not know or care about the format of data records. In theory you could write OTLP logs to a delivery stream using https://docs.aws.amazon.com/firehose/latest/APIReference/API_PutRecord.html, but there is no way for the receiver to unmarshal this. OTLP is just an example: one might produce arbitrary observability data (or other types of data, but out of scope here) to Firehose, which also cannot be decoded without additional unmarshallers.
The unmarshallers within the awsfirehosereceiver module may also be useful for other receivers, e.g. in the S3 receiver for decoding data sent by Firehose to an S3 bucket. Since they are internal to the module, they cannot be reused.
Describe the solution you'd like
I would like to enhance the receiver to implement the encoding extension framework, so that users may inject additional encodings. I have created an untested PoC of this at axw@1904aa5. In that PoC I have left the existing unmarshallers in the receiver, but I would envisage them eventually moving to dedicated extension components.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: