-
Notifications
You must be signed in to change notification settings - Fork 43
feat: add gzip compression to SQS replay events generated from CloudWatch #887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add gzip compression to SQS replay events generated from CloudWatch #887
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
Minimum allowed coverage is Generated by 🐒 cobertura-action against 9630a12 |
cb864db
to
92eb7a6
Compare
92eb7a6
to
dcd5637
Compare
dcd5637
to
2c12cc2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the tiny detail of the replay queue, it's looking good.
9f112f0
to
22c6174
Compare
22c6174
to
a538fb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we're only compressing messages from the cloudwatch logs trigger. Should we consider compressing messages from all triggers? I'm not 100% sure this is the right call, but I want to evaluate pros/cons with you.
Having all the messages encoded using the same format may be appropriate for consistency.
900f8c9
to
09b6784
Compare
The change applies to CloudWatch triggers as well as for shipper failures for parsed events. The shipping failures apply to all inputs and are handled commonly at Regarding triggers such as sqs, s3 & kinesis, they all use SQS Move, which internally uses a
Anyway the pros I see is,
|
09b6784
to
7f21f96
Compare
97c28c4
to
018ed57
Compare
@Kavindu-Dodan, I see you decided to only compress the So, given the following non-compressed event: {
"output_destination": "https://idonotexist.zmoog.dev:443",
"output_args": {
"es_datastream_name": "logs-generic-release120script"
},
"event_payload": {
"@timestamp": "2025-05-20T16:59:59.268898Z",
"tags": [
"forwarded",
"generic"
],
"data_stream": {
"type": "logs",
"dataset": "generic",
"namespace": "release120script"
},
"event": {
"dataset": "generic"
},
"_op_type": "create",
"_index": "logs-generic-release120script",
"_id": "123",
"message": "Example Event 3",
"log": {
"offset": 0,
"file": {
"path": "mbranca-test/2025-04-30"
}
},
"aws": {
"cloudwatch": {
"log_group": "mbranca-test",
"log_stream": "2025-04-30",
"event_id": "38976358746361840955885811431946476608269635496678719490"
}
},
"cloud": {
"provider": "aws",
"region": "eu-west-1",
"account": {
"id": "123"
}
}
},
"event_input_id": "arn:aws:logs:eu-west-1:123:log-group:mbranca-test:*"
} If we store the example event in compressed form, it would become: {
"output_destination": "https://idonotexist.zmoog.dev:443",
"output_args": {
"es_datastream_name": "logs-generic-release120script"
},
"event_payload": "<base64 encoded event payload>",
"event_input_id": "arn:aws:logs:eu-west-1:123:log-group:mbranca-test:*"
} right? In the spirit of keeping things simple, why not compress the whole message? It would be simpler to replace calls to |
@zmoog good point and thank you for raising them :)
Yes what you see is the event format we store. And your comparison is correct ; we only compress
The reason is some pre-parsing happening to detect the event type internal to [1] https://github.com/elastic/elastic-serverless-forwarder/blob/main/handlers/aws/utils.py#L399-L409 |
Uhm, maybe I'm wrong, but I feel we're only patching the CloudWatch path. The integration tests focus on this use case, and I'm not sure we're testing the case where we already have a non-compressed message in the queue. I'm not 100% we are covering all cases and not increasing the complexity handling cloudwatch differently from other triggers. I'll take some time aside for a deeper look. |
018ed57
to
953c0bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using SQS message attributes (e.g., Content-Encoding
) to flag compressed content to make the flow cleaner and more explicit?
Here's how it could work:
Benefits:
- Explicit detection - No need to guess if the content is compressed
- Better error handling - Only decompress when the attribute is present
- Backward compatibility - Uncompressed messages work unchanged
- Performance - Avoid unnecessary decompression attempts
Implementation approach:
- Sending side - Add message attribute
Content-Encoding
when compressing - Receiving side - Check attribute
Content-Encoding
before decompressing
This eliminates the try/catch fallback logic and makes the compression handling explicit and reliable. The code becomes more maintainable, and the intent is clearer.
10747c6
to
7d51769
Compare
@zmoog I have implemented this suggestion with 7d51769 Prior to this change, only replay messages had attributes. With this change, we are add the I have validated the functionality of the implementation with following workflows,
And can confirm it works. Can you have another look ? |
931ef93
to
015cd35
Compare
…ed back Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co> revert back the version.py Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
Signed-off-by: Kavindu Dodanduwa <kavindu.dodanduwa@elastic.co>
015cd35
to
9630a12
Compare
@zmoog thank you for all the discussions and reviews on the PR 🙌 Enjoyed working on this. |
What does this PR do?
AWS recently increased CloduWatch message size to 1MB 1. However, SQS message size is still capped at 256KB 2.
This means, replay messages generated from CloudWatch input can cause SQS message limit (see screenshot below)
This PR attempts to avoid this edge case by adding gzip compression to SQS message's event payload. While this will not 100% eliminate the limitation, this improvement allows to handle SQS max message size (1MB) if gzip compression ratio of that message reaches at least 4:1 (1MB : 250KB).
Checklist
CHANGELOG.md
and updatedshare/version.py
, if my change requires a new release.How to test this PR locally
make package
)I have validated replying same message through Lambda to validate correct parsing of gzipped payloads.
Screenshots
Compressed vs uncompressed message size comparison at SQS queue,
Footnotes
https://aws.amazon.com/about-aws/whats-new/2025/04/amazon-cloudwatch-logs-increases-log-event-size-1-mb ↩
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/quotas-messages.html ↩