Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub-Issue#2778: Added CloudWatchLogs Sink Config Files #2922

Merged
merged 13 commits into from
Jul 5, 2023

Conversation

MaGonzalMayedo
Copy link
Contributor

@MaGonzalMayedo MaGonzalMayedo commented Jun 21, 2023

Description

This change contains the Configuration files for the CloudWatchLogs Sink. It will act as a basis for reading the
user entered configuration yaml file.

Issues Resolved

This PR will work towards resolving CWL-Sink for Data-Prepper [Issue https://github.com//issues/2778]
Link: #2778 (comment)

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -0,0 +1,42 @@
.gradle

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be at the root of the repo?

Copy link
Collaborator

@kkondaka kkondaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the package path data-prepper-plugins/cwl-sink. We have someone working on CWL as source as well. So, it probably makes sense to use the path data-prepper-plugins/cloudwatch-logs/ and then you can have data-prepper-plugins/cwl-sink/src/main/java/org/opensearch/dataprepper/plugins/sink/ for this and the source will have data-prepper-plugins/cwl-sink/src/main/java/org/opensearch/dataprepper/plugins/source

@@ -0,0 +1,42 @@
.gradle
build/
!gradle/wrapper/gradle-wrapper.jar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should removed. I think it was not intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those files should be removed now.

@@ -0,0 +1,8 @@
package org.opensearch.dataprepper;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be removed too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

public static final String DEFAULT_BUFFER_TYPE = "in_memory";

//TODO: Change this to custom aws config class as its just a data container.
@JsonProperty("aws_config")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use "aws:" for this. Please follow the same convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name has been changed from 'aws_config' to 'aws'.

}

group = 'org.opensearch.dataprepper'
version = '2.3.0-SNAPSHOT'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need line 6 and 7.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed these lines.

* where the configuration allows the sink to fetch Aws credentials
* and resources.
*/
public class AwsConfig {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use this and not AwsAuthenticationOptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to AwsConfig.

@JsonProperty("aws")
@NotNull
@Valid
private AwsAuthenticationOptions awsConfig;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do private AwsConfig awsConfig;?
Duplicate AwsAuthenticationOptions code if needed.
We are working on moving this to a common location. For now, it's ok if you duplicate it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to AwsConfig.

dependencies {
api project(':data-prepper-api')
implementation project(':data-prepper-plugins:aws-plugin-api')
implementation project(path: ':data-prepper-plugins:s3-sink')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are adding this to import AwsAuthenticationOptions. But we currently do not do like this. Ideally the AwsAuthenticationOptions should be moved to common directory. But for now, it is OK if you duplicate the code and put it in cloud watch logs directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I realized that by adding that dependency we are not isolating the separate plugins. I have removed the dependency and proceeded to use my own version of AwsConfig.


private int backOffTime = DEFAULT_BACKOFF_TIME;

public int getBatchSize() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be too many config options for the customer. We also want to avoid a retry storm so maybe we should decide on a retry_count ourselves.
Up to DP team how they want to proceed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this open for configuration and provide default values. Customer might want to tweak based on the use case. We can use default values and document in Readme. I'm not aware of all the cloudwatch configuration, but we should allow CX to configure value to increase throughput and remove the ones that are not required.

* where the configuration allows the sink to fetch Aws credentials
* and resources.
*/
public class AwsConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reuse AwsCredentialsOptions. Will this class be used to construct AwsCredentialsOptions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, AwsConfig is re-using code from AwsCredentialsOptions as AwsCredentialsOptions is not a general utils class. This would make the CloudWatchLogs plugin rely on a dependency from another plugin if we used AwsCredentialsOptions. I changed it to AwsConfig after talking to Krishna about this, we can discuss this further as well.

public class CwlSinkConfig {
public static final String DEFAULT_BUFFER_TYPE = "in_memory";

//Class was utilized from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incomplete documentation.


private int backOffTime = DEFAULT_BACKOFF_TIME;

public int getBatchSize() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep this open for configuration and provide default values. Customer might want to tweak based on the use case. We can use default values and document in Readme. I'm not aware of all the cloudwatch configuration, but we should allow CX to configure value to increase throughput and remove the ones that are not required.

Comment on lines 21 to 22
@JsonProperty("path_to_credentials")
private String pathToCredentials;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this ? Look like we dont use this parameter in other Sinks ?

Copy link
Contributor Author

@MaGonzalMayedo MaGonzalMayedo Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I normally presented this parameter as an option to allow customers to specify credential files for SDK configuration, but after discussing this further with Mark and it being brought up here, it would be best to remove it. I will have this for the next revision. It is too much control for the customer and insecure as we require the pathing for the file.

@JsonProperty("buffer_type")
private String bufferType = DEFAULT_BUFFER_TYPE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are other options ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there is only a default option for the buffer, and that is InMemoryBuffer, but this was made an option as in the future we could implement different buffering methods for customers who want to handle that application aspect differently. Similar to how the S3 handles InMemoryBuffering and InFileBuffering.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like Speculative Generality. Is there a plan to add additional buffers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently within the scope of the Design of this plugin, there are no goals for this. But I have added the addition of other Buffering systems such as Kafka and File Buffering as Next Step additions that could benefit the plugin for future usage. The generalization of this Buffer will prove useful as our services grow and customers require variable sources for which to buffer information. It would make it easier for downstream additions to be applied and integrated.

Comment on lines 11 to 15
public static final String LOG_GROUP = "testGroup";
public static final String LOG_STREAM = "testStream";
public static final String BUFFER_TYPE = "in_memory";
public static final int BATCH_SIZE = 10;
public static final int MAX_RETRIES = 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these values used ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it seems that while working on the testing, I had left these values here although they were never used. I will get rid of these for the next revision.

@Valid
private AwsConfig awsConfig;

@JsonProperty("threshold_config")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to threshold, having config here is redundant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name has been changed for the next revision.

@Size(min = 2, max = 1224, message = "awsStsExternalId length should be between 2 and 1224 characters")
private String awsStsExternalId;

public int getDEFAULT_CONNECTION_ATTEMPTS() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? If it's not configurable by user you can use the static variable in CwlSink class directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the default_connections to a static variable, this should follow better standard conventions.

@Size(min = 1, max = 1048576, message = "max_batch_request_size amount should be between 1 and 1048576 bytes")
private int maxRequestSize = DEFAULT_SIZE_OF_REQUEST;

@JsonProperty("retry_count")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this retry count different from AWS SDK retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from AWS SDK retry as the SDK retry is a built in error retransmission method that can still fail and throw exceptions that are not directly the fault of the client. This can be seen in the documentation for the exceptions thrown by a call to putLogEvents() that can return RetryableException or AwsServiceException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter is also based on the S3-Sink retry parameter "max_retries" as it attempts to retry recoverable requests.

public class CwlSinkConfigTest {
@Test
void check_null_auth_config_test() {
assertThat(new CwlSinkConfig().getAwsConfig(), equalTo(null));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use this ReflectivelySetField class to configure the options and verify if they're being set correctly.

Copy link
Contributor Author

@MaGonzalMayedo MaGonzalMayedo Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for letting me in on this. I will re-write the test for the CwlSink to match these. I also used ObjectMapper.

api project(':data-prepper-api')
implementation project(':data-prepper-plugins:aws-plugin-api')
testImplementation platform('org.junit:junit-bom:5.9.1')
implementation platform('software.amazon.awssdk:bom:2.20.56')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to define bom dependency here, I think it's configured in root build.gradle. If you have to use latest version update it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the sdk bom dependecy for the next revision.

import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.equalTo;

public class ThresholdConfigTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add test to check all default values.

void check_default_values() {
   ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the default test to ThresholdConfig.

@dinujoh dinujoh self-requested a review June 28, 2023 21:29
asifsmohammed
asifsmohammed previously approved these changes Jun 28, 2023
Copy link
Collaborator

@asifsmohammed asifsmohammed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. All the files should include copyright header, you can fix that in next revision.

dinujoh
dinujoh previously approved these changes Jun 29, 2023
graytaylor0 and others added 11 commits July 3, 2023 12:56
…pensearch-project#2910)

Create Elasticsearch client, implement search and pit apis for ElasticsearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Added Config Files for CloudWatchLogs Sink.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
… syntax)

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Added default params for back_off and log_send_interval alongside test cases for ThresholdConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
…ith AwsConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
…mer and params to AwsConfig

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
…tive mapping to tests CwlSink

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
…xRequestSize

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <95880281+MaGonzalMayedo@users.noreply.github.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
@kkondaka kkondaka merged commit 4aa35a3 into opensearch-project:main Jul 5, 2023
45 of 46 checks passed
MaGonzalMayedo added a commit to MaGonzalMayedo/data-prepper that referenced this pull request Jul 25, 2023
…-project#2922)

* Elasticsearch client implementation with pit and no context search (opensearch-project#2910)

Create Elasticsearch client, implement search and pit apis for ElasticsearchAccessor

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* GitHub-Issue#2778: Refactoring config files for CloudWatchLogs Sink (#4)

Added Config Files for CloudWatchLogs Sink.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes from comments to code (including pathing and nomenclature syntax)

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Refactoring config (#5)

Added default params for back_off and log_send_interval alongside test cases for ThresholdConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Fixed deleted AwsConfig file

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed the s3 dependency from build.gradle, replaced the AwsAuth.. with AwsConfig.

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added modifiable back_off_timer, added threshold test for back_off_timer and params to AwsConfig

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added fixes to gradle file, added tests to AwsConfig, and used Reflective mapping to tests CwlSink

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added default value test to ThresholdConfig and renamed getter for maxRequestSize

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Removed unnecessary imports

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added cloudwatch-logs to settings.gradle

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

* Added a quick fix to the back_off_time range

Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>

---------

Signed-off-by: Taylor Gray <tylgry@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <95880281+MaGonzalMayedo@users.noreply.github.com>
Co-authored-by: Taylor Gray <tylgry@amazon.com>
Co-authored-by: Marcos <alemayed@amazon.com>
Signed-off-by: Marcos Gonzalez Mayedo <alemayed@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants