-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for AWS security lake sink as a bucket selector mode in S3 sink #4846
Conversation
3959e7b
to
b45bd07
Compare
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
…3 sink Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
b45bd07
to
4f7ac58
Compare
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
} | ||
|
||
@Override | ||
public Map<String, String> getMetadata(int eventCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is related to the bucket_selector
. I'd suggest that the aws_security_lake
transformation just add this as additional metadata. I also think "additional metadata" is clearer since it doesn't overwrite the existing metadata.
s3:
bucket_selector:
aws_security_lake
additional_metadata:
asl_rows: ${numberOfEvents}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlvenable how do we plan do support ${numberOfEvents}
? It is not a field in the key. It is not a function. I think it would be confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had done something similar to this in either the s3
sink or opensearch
sink already where you could add some custom sink-only data (either to the index name or the S3 path). If we have that, we can follow that pattern.
Another idea is to make it somewhat less flexible by specifying the name exactly.
s3:
additional_batch_metadata:
aws_rows: numberOfEvents
We could require that the value be numberOfEvents
exactly. In the future, maybe we add more.
Yet another approach would be less flexible, but could solve it:
s3:
number_of_events_key: asl_rows
int locIndex = sourceLocation.indexOf("/ext/"); | ||
return sourceLocation.substring(5, locIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int locIndex = sourceLocation.indexOf("/ext/"); | |
return sourceLocation.substring(5, locIndex); | |
final String bucketSeparator = "/ext/"; | |
int locIndex = sourceLocation.indexOf(bucketSeparator); | |
return sourceLocation.substring(bucketSeparator.length(), locIndex); |
import software.amazon.awssdk.services.securitylake.model.CustomLogSourceConfiguration; | ||
import software.amazon.awssdk.services.securitylake.model.CreateCustomLogSourceResponse; | ||
import software.amazon.awssdk.services.securitylake.model.CustomLogSourceCrawlerConfiguration; | ||
//import software.amazon.awssdk.services.securitylake.model.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
final LocalDate now = LocalDate.now(); | ||
final String eventDay = String.format("%d%02d%02d", now.getYear(), now.getMonthValue(), now.getDayOfMonth()); | ||
int locIdx = sourceLocation.indexOf("/ext/"); | ||
pathPrefix = sourceLocation.substring(locIdx+1)+"region="+region+"/accountId="+accountId+"/eventDay="+eventDay+"/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only create this once. Use String.format()
for clarity.
final String accountId=arn.split(":")[4]; | ||
final LocalDate now = LocalDate.now(); | ||
final String eventDay = String.format("%d%02d%02d", now.getYear(), now.getMonthValue(), now.getDayOfMonth()); | ||
int locIdx = sourceLocation.indexOf("/ext/"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can extract this "/ext/"
into a constant. You also use it below.
@JsonProperty("bucket_selector") | ||
private PluginModel bucketSelector; | ||
|
||
@AssertTrue(message = "Only one of bucket or bucket_selector option should be specified") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AssertTrue(message = "Only one of bucket or bucket_selector option should be specified") | |
@AssertTrue(message = "You may not use both bucket and bucket_selector together in one S3 sink.") |
This message should be more readable.
Signed-off-by: Kondaka <krishkdk@amazon.com>
Signed-off-by: Kondaka <krishkdk@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good!
@@ -39,5 +43,6 @@ public int hashCode() { | |||
|
|||
public Map<String, Object> getGroupIdentifierHash() { return groupIdentifierHash; } | |||
|
|||
public Map<String, String> getMetadata(int eventCount) { return predefinedObjectMetadata != null ? Map.of(predefinedObjectMetadata.getNumberOfObjects(), Integer.toString(eventCount)) : null; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Structurally, I think this is not the place to do this. The metadata could become more involved and would require additional inputs. But, I think this can work for now. Maybe we can clean it up in a follow-on PR.
@Size(min = 3, max = 500, message = "bucket length should be at least 3 characters") | ||
private String bucketName; | ||
|
||
@JsonProperty("bucket_selector") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add @JsonPropertyDescriptions
to these if possible. Can be a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do follow up PR
import com.fasterxml.jackson.annotation.JsonProperty; | ||
public class PredefinedObjectMetadata { | ||
@JsonProperty("number_of_objects") | ||
private String numberOfObjects; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a String? Sounds like it would be integer with the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should call this number_of_objects_key
. Will do it in the next PR
…3 sink (opensearch-project#4846) * dplive1.yaml Signed-off-by: Kondaka <krishkdk@amazon.com> * Delete .github/workflows/static.yml Signed-off-by: Kondaka <krishkdk@amazon.com> * Add support for AWS security lake sink as a bucket selector mode in S3 sink Signed-off-by: Kondaka <krishkdk@amazon.com> * Fixed tests Signed-off-by: Kondaka <krishkdk@amazon.com> * Added javadoc for S3BucketSelector Signed-off-by: Kondaka <krishkdk@amazon.com> * Added new tests for KeyGenerator Signed-off-by: Kondaka <krishkdk@amazon.com> * Added new tests and fixed style errors Signed-off-by: Kondaka <krishkdk@amazon.com> * Addressed review comments Signed-off-by: Kondaka <krishkdk@amazon.com> * Fixed test build failure Signed-off-by: Kondaka <krishkdk@amazon.com> --------- Signed-off-by: Kondaka <krishkdk@amazon.com>
Description
Add support for AWS security lake sink as a bucket selector mode in S3 sink
Issues Resolved
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.