Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-14337 - Enhance JoltTransformJSON to Support JOLT Transformation… #9785

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Srilatha-ramreddy
Copy link

@Srilatha-ramreddy Srilatha-ramreddy commented Mar 7, 2025

…s on Attributes

Summary

NIFI-14337

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for proposing this improvement @Srilatha-ramreddy.

Although having an optional property works, it is not immediately clear that this would alter the behavior. To make the implementation clearer, it would be helpful to introduce an additional strategy property. The property could be named JSON Source and could have values of Attribute or FlowFile using an enum that implements DescribedValue to bound the supported options. The property would be required, and would default to FlowFile, maintaining the current behavior. The new FlowFile Attribute property would depend on this strategy property.

Although that approach means introducing an additional property, it would make the configured behavior of the Processor much clearer. Feel free to raise any questions about this strategy.

Comment on lines 68 to 69
.name("Json Attribute")
.displayName("Json Attribute")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The displayName() method is not needed when the name() is the same. I also recommend renaming the property to FlowFile Attribute.

Suggested change
.name("Json Attribute")
.displayName("Json Attribute")
.name("FlowFile Attribute")

Comment on lines 484 to 485
final Map<String, String> attributes = Collections.singletonMap("jsonAttr",
"{\"rating\":{\"primary\":{\"value\":3},\"series\":{\"value\":[5,4]},\"quality\":{\"value\":}}}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simply use Map.of like you do on line 497. Similar comment for line 511.

Suggested change
final Map<String, String> attributes = Collections.singletonMap("jsonAttr",
"{\"rating\":{\"primary\":{\"value\":3},\"series\":{\"value\":[5,4]},\"quality\":{\"value\":}}}");
final Map<String, String> attributes = Map.of("jsonAttr",
"{\"rating\":{\"primary\":{\"value\":3},\"series\":{\"value\":[5,4]},\"quality\":{\"value\":}}}");

Comment on lines 61 to 62
+ "When 'Json Source' is set to FLOW_FILE, the FlowFile content is transformed and the modified FlowFile is routed to 'success' relationship. "
+ "When 'Json Source' is set to ATTRIBUTE, the specified attribute's value is transformed and updated in place, with the FlowFile routed to 'success' relationship. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ "When 'Json Source' is set to FLOW_FILE, the FlowFile content is transformed and the modified FlowFile is routed to 'success' relationship. "
+ "When 'Json Source' is set to ATTRIBUTE, the specified attribute's value is transformed and updated in place, with the FlowFile routed to 'success' relationship. "
+ "When 'Json Source' is set to FLOW_FILE, the FlowFile content is transformed and the modified FlowFile is routed to the 'success' relationship. "
+ "When 'Json Source' is set to ATTRIBUTE, the specified attribute's value is transformed and updated in place, with the FlowFile routed to the 'success' relationship. "

jsonSourceAttributeName = context.getProperty(JSON_SOURCE_ATTRIBUTE).evaluateAttributeExpressions(original).getValue();
final String jsonSourceAttributeValue = original.getAttribute(jsonSourceAttributeName);
if (StringUtils.isBlank(jsonSourceAttributeValue)) {
logger.error("FlowFile attribute value evaluated to null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringUtils.isBlank is not only when a string is null. It can even be when the string is empty or only has white space.

Suggested change
logger.error("FlowFile attribute value evaluated to null");
logger.error("FlowFile attribute value was blank);

final boolean isSourceFlowFileContent = SourceStrategy.FLOW_FILE == context.getProperty(JSON_SOURCE).asAllowableValue(SourceStrategy.class);
String jsonSourceAttributeName = null;

if (isSourceFlowFileContent ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (isSourceFlowFileContent ) {
if (isSourceFlowFileContent) {

@Test
void testJsonAttributeNotInitialised() throws IOException {
runner.setProperty(JoltTransformJSON.JSON_SOURCE, SourceStrategy.ATTRIBUTE);
runner.setProperty(JoltTransformJSON.JOLT_SPEC, "./src/test/resources/specs/shiftrSpec.json");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this value "./src/test/resources/specs/shiftrSpec.json" being used a total of three times by you on this line, line 481, and line 494. Another test uses the same value on line 225. In addition there is another form of this same file used "src/test/resources/specs/shiftrSpec.json" without the leading ./ on lines 214 and 255. Please make a private static final String variable with one of these values and use it in all six places.

runner.setProperty(JoltTransformJSON.JSON_SOURCE, SourceStrategy.ATTRIBUTE);
runner.setProperty(JoltTransformJSON.JOLT_SPEC, "./src/test/resources/specs/shiftrSpec.json");
runner.setProperty(JoltTransformJSON.JOLT_TRANSFORM, JoltTransformStrategy.SHIFTR);
runner.setProperty(JoltTransformJSON.JSON_SOURCE_ATTRIBUTE, "jsonAttr");
Copy link
Contributor

@dan-s1 dan-s1 Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this value "jsonAttr" being used 8 times for defining the name of the attribute (this line and on lines 483, 484, 498, 510, 511, 523 and 525). Please make a private static final String variable and use it in each of those places.

@@ -464,6 +464,71 @@ void testJoltSpecInvalidEL() throws IOException {
runner.assertNotValid();
}

@Test
Copy link
Contributor

@dan-s1 dan-s1 Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exceptionfactory I see 4 new unit tests which are configured very similarly. Do you think these four tests should be folded into a JUnit5 ParamaterizedTest and use the MethodSource annotation to define a method which would return the necessary arguments for each of these tests?

@Srilatha-ramreddy
Copy link
Author

@exceptionfactory @dan-s1 Thanks for the feedback. Review comments are now addressed and is Ready for review. Thanks

@exceptionfactory
Copy link
Contributor

Thanks for the updates @Srilatha-ramreddy, please review the Checkstyle warnings. I will take a closer look at the latest version soon.

Warning:  src/test/java/org/apache/nifi/processors/jolt/TestJoltTransformJSON.java:[478] (sizes) LineLength: Line is longer than 200 characters (found 213).

Comment on lines 475 to 484
return Stream.of(
Arguments.of(JSON_SOURCE_ATTR_NAME, null, SHIFTR_SPEC_PATH,
JoltTransformStrategy.SHIFTR, false, null),
Arguments.of(JSON_SOURCE_ATTR_NAME, Map.of(JSON_SOURCE_ATTR_NAME, INVALID_INPUT_JSON), SHIFTR_SPEC_PATH,
JoltTransformStrategy.SHIFTR, false, null),
Arguments.of("${dynamicJsonAttr}", Map.of("dynamicJsonAttr", JSON_SOURCE_ATTR_NAME, JSON_SOURCE_ATTR_NAME, EXPECTED_JSON), SHIFTR_SPEC_PATH,
JoltTransformStrategy.SHIFTR, true, SHIFTR_JSON_OUTPUT),
Arguments.of(JSON_SOURCE_ATTR_NAME, Map.of(JSON_SOURCE_ATTR_NAME, EXPECTED_JSON), CHAINR_SPEC_PATH,
JoltTransformStrategy.CHAINR, true, CHAINR_JSON_OUTPUT)
);
Copy link
Contributor

@dan-s1 dan-s1 Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Srilatha-ramreddy Thanks for adding the ParamaterizedTest. That is what I had in mind. I am requesting one minor change, instead of using Arguments.of please use Arguments.argumentSet so you can give a meaningful name to each of the tests. I was going to start from the original names of the unit tests you had started with although I no longer see that commit as it seems you squashed it. Please note in general once the PR has been submitted no squashes should be done. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dan-s1 Thanks for the feedback. Please review the latest commit and will also not squash the commits anymore.

Copy link
Contributor

@exceptionfactory exceptionfactory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Srilatha-ramreddy. I noted a few minor adjustments, and one larger question regarding Expression Language evaluation. As mentioned, support for Expression Language opens up some additional possibilities that should be considered.

@RequiresInstanceClassLoading
public class JoltTransformJSON extends AbstractJoltTransform {

public static final PropertyDescriptor JSON_SOURCE = new PropertyDescriptor.Builder()
.name("Json Source")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON should be all uppercase in property names:

Suggested change
.name("Json Source")
.name("JSON Source")


public static final PropertyDescriptor JSON_SOURCE = new PropertyDescriptor.Builder()
.name("Json Source")
.description("Specifies whether the JOLT transformation is applied to FlowFile JSON content or to specified FlowFile JSON attribute.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.description("Specifies whether the JOLT transformation is applied to FlowFile JSON content or to specified FlowFile JSON attribute.")
.description("Specifies whether the Jolt transformation is applied to FlowFile JSON content or to specified FlowFile JSON attribute.")

.build();

public static final PropertyDescriptor JSON_SOURCE_ATTRIBUTE = new PropertyDescriptor.Builder()
.name("Json Source Attribute")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.name("Json Source Attribute")
.name("JSON Source Attribute")

Comment on lines 61 to 62
+ "When 'Json Source' is set to FLOW_FILE, the FlowFile content is transformed and the modified FlowFile is routed to the 'success' relationship. "
+ "When 'Json Source' is set to ATTRIBUTE, the specified attribute's value is transformed and updated in place, with the FlowFile routed to the 'success' relationship. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is duplicative of the property description, so I recommend removing it.

Comment on lines 77 to 78
.description("The FlowFile attribute containing JSON to be transformed. "
+ "This property is required only when 'Json Source' is set to ATTRIBUTE.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be specified as a multiline string, however, it is not necessary to include the second sentence, since the documentation rendering automatically describes dependent properties.

logger.error("JSON parsing failed for {}", original, e);
session.transfer(original, REL_FAILURE);
return;
final boolean isSourceFlowFileContent = SourceStrategy.FLOW_FILE == context.getProperty(JSON_SOURCE).asAllowableValue(SourceStrategy.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor renaming recommendation:

Suggested change
final boolean isSourceFlowFileContent = SourceStrategy.FLOW_FILE == context.getProperty(JSON_SOURCE).asAllowableValue(SourceStrategy.class);
final boolean sourceStrategyFlowFile = SourceStrategy.FLOW_FILE == context.getProperty(JSON_SOURCE).asAllowableValue(SourceStrategy.class);

jsonSourceAttributeName = context.getProperty(JSON_SOURCE_ATTRIBUTE).evaluateAttributeExpressions(original).getValue();
final String jsonSourceAttributeValue = original.getAttribute(jsonSourceAttributeName);
if (StringUtils.isBlank(jsonSourceAttributeValue)) {
logger.error("FlowFile attribute value was blank");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The attribute name should be included in the message:

Suggested change
logger.error("FlowFile attribute value was blank");
logger.error("FlowFile attribute [{}] value is blank", jsonSourceAttributeName);

return;
}
} else {
jsonSourceAttributeName = context.getProperty(JSON_SOURCE_ATTRIBUTE).evaluateAttributeExpressions(original).getValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for Expression Language raises as important question. Evaluating the expression to return an attribute name could be confusing, since ${attributeName} would return the value JSON, not the attribute name itself. One option is to remove support for Expression Language. The other option is to change the property name to describe the JSON Source itself. This also impacts the JSON Source property options. The options could be FLOW_FILE and SOURCE_REFERENCE or similar, with JSON Source Reference supporting Expression Language. The property naming needs some further consideration, as I'm not sure Source Reference is as clear as it should be. Perhaps JSON Source Content.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exceptionfactory I agree with you, EL support was just to support any edge case scenarios but there can be workarounds to achieve what is intended. Happy to remove the expression language support.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exceptionfactory Can you please review.

try {
inputJson = jsonUtil.jsonToObject(jsonSourceAttributeValue);
} catch (final Exception e) {
logger.error("JSON parsing failed on FlowFile attribute for {}", original, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to include the name of the attribute like you did above.

Suggested change
logger.error("JSON parsing failed on FlowFile attribute for {}", original, e);
logger.error("JSON parsing failed on attribute '{}' of FlowFile {}", jsonSourceAttributeName, original, e);

Comment on lines +206 to +207
logger.info("Transform completed on FlowFile attribute for {}", original);
}
Copy link
Contributor

@dan-s1 dan-s1 Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, it may be beneficial to name the attribute which was transformed so it is clear in the logs

Suggested change
logger.info("Transform completed on FlowFile attribute for {}", original);
}
logger.info("Transform completed on attribute {} of FlowFile {}", sonSourceAttributeName, original);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants