Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PersonalizeSink: add client and configuration classes #4803

Merged
merged 5 commits into from
Aug 14, 2024

Conversation

ivan-tse
Copy link
Contributor

@ivan-tse ivan-tse commented Aug 1, 2024

Description

Add client and configuration classes for Personalize Sink

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [ x] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • [ x] New functionality has javadoc added
  • [ x] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ivan Tse <tseiva@amazon.com>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is looking good. I have a few suggestions and comments that should be fairly straight-forward.

testImplementation testLibs.slf4j.simple
}

test {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need these three lines. The root project provides this configuration.

* Initialize {@link PersonalizeSinkService}
*/
private void doInitializeInternal() {
sinkInitialized = Boolean.TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sinkInitialized = Boolean.TRUE;
sinkInitialized = true;

* Implementation class of personalize-sink plugin. It is responsible for receiving the collection of
* {@link Event} and uploading to amazon personalize.
*/
@DataPrepperPlugin(name = "personalize", pluginType = Sink.class, pluginConfigurationType = PersonalizeSinkConfiguration.class)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm a user looking at Data Prepper documentation, I'm not sure I'd know what "personalize" means. Maybe aws_personalize would be clearer? I'm ok keeping it as you have it as well, but want to point out the possible confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with that. I'll change it to aws_personalize

this.personalizeSinkConfig = personalizeSinkConfig;
this.sinkContext = sinkContext;

sinkInitialized = Boolean.FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sinkInitialized = Boolean.FALSE;
sinkInitialized = false;

* @param records received records and add into buffer.
*/
void output(Collection<Record<Event>> records) {
LOG.info("{} records received", records.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOG.info("{} records received", records.size());
LOG.trace("{} records received", records.size());

This is something that you would mostly use for testing the sink itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is placeholder until the sink gets implemented. Will change to trace

Comment on lines 20 to 29
implementation 'org.jetbrains.kotlin:kotlin-stdlib:1.9.22'
implementation libs.avro.core
implementation(libs.hadoop.common) {
exclude group: 'org.eclipse.jetty'
exclude group: 'org.apache.hadoop', module: 'hadoop-auth'
exclude group: 'org.apache.zookeeper', module: 'zookeeper'
}
implementation libs.parquet.avro
implementation 'software.amazon.awssdk:apache-client'
implementation 'org.jetbrains.kotlin:kotlin-stdlib-common:1.9.22'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
implementation 'org.jetbrains.kotlin:kotlin-stdlib:1.9.22'
implementation libs.avro.core
implementation(libs.hadoop.common) {
exclude group: 'org.eclipse.jetty'
exclude group: 'org.apache.hadoop', module: 'hadoop-auth'
exclude group: 'org.apache.zookeeper', module: 'zookeeper'
}
implementation libs.parquet.avro
implementation 'software.amazon.awssdk:apache-client'
implementation 'org.jetbrains.kotlin:kotlin-stdlib-common:1.9.22'

I do not believe you need any of these items. Certainly you shouldn't need Avro, Parquet, and Hadoop. And unless you are using a library that requires Kotlin, you can remove that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let me clean up the build.gradle file

* Class responsible for creating PersonalizeEventsClient object, check thresholds,
* get new buffer and write records into buffer.
*/
public class PersonalizeSinkService {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public class PersonalizeSinkService {
class PersonalizeSinkService {

Make this package protected since you will probably only use it in PersonalizeSink.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

import software.amazon.awssdk.core.retry.RetryPolicy;
import software.amazon.awssdk.services.personalizeevents.PersonalizeEventsClient;

public final class ClientFactory {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public final class ClientFactory {
final class ClientFactory {

I think this class can also be package protected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

personalizeSink = createObjectUnderTest();
Assertions.assertNotNull(personalizeSink);
personalizeSink.doInitialize();
assertTrue(personalizeSink.isReady(), "personalize sink is not initialized and not ready to work");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertTrue(personalizeSink.isReady(), "personalize sink is not initialized and not ready to work");
assertTrue(personalizeSink.isReady(), "Expected the personalize sink to be ready, but it is reporting it is not ready.");

I think the current message may be confusing if the test fails in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, will change the messaging

void test_personalize_Sink_plugin_isReady_negative() {
personalizeSink = createObjectUnderTest();
Assertions.assertNotNull(personalizeSink);
assertFalse(personalizeSink.isReady(), "personalize sink is initialized and ready to work");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertFalse(personalizeSink.isReady(), "personalize sink is initialized and ready to work");
assertFalse(personalizeSink.isReady(), "Expected the personalize sink to report that it is not ready, but it is reporting it is ready.");

@dlvenable
Copy link
Member

Thank you @ivan-tse for starting this work and making this initial contribution!

Signed-off-by: Ivan Tse <tseiva@amazon.com>
@Override
public void doInitialize() {
try {
doInitializeInternal();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This doesn't have to be a function if the function is just setting a boolean. Do you expect this function to do something different in future? If yes, keep it like it is now. Other wise I think it is unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change this and add back in the future if necessary

@kkondaka
Copy link
Collaborator

kkondaka commented Aug 6, 2024

Looks good to me. I can approve this once all tests pass. David is away this week, but I see that you have addressed all his comments.

Signed-off-by: Ivan Tse <tseiva@amazon.com>
Signed-off-by: Ivan Tse <tseiva@amazon.com>
Copy link
Member

@graytaylor0 graytaylor0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this PR mainly just boilerplate and configuration! Made it much easier to review


@AssertTrue(message = "sts_role_arn must be an IAM Role", groups = PersonalizeAdvancedValidation.class)
boolean isValidStsRoleArn() {
final Arn arn = getArn();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will want to return false here if this method throws an IllegalArgumentException. Have you tested with invalid arn format and observed the exception?

Also this role should be optional in this configuration, so adding a null check to return true in the method should be enough. The reason it is optional here is because users can configure a default role in the data-prepper-config.yaml (#4559)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I have tested with invalid arn format.

I didn't know about the default in data-prepper-config.yaml. I'll make this optional

Signed-off-by: Ivan Tse <tseiva@amazon.com>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Thanks for this contribution and making the requested changes!

@dlvenable dlvenable merged commit 38fe2af into opensearch-project:main Aug 14, 2024
71 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants