-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for dynamic rule detection for pipeline config transformation #4601
Conversation
|
||
Path rulesFolderPath; | ||
|
||
if ("jar".equals(uri.getScheme())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we expect the rules in a jar file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in runtime, we would expect everything inside a jar.
List<Path> mockRuleFiles = Arrays.asList( | ||
Paths.get("src/test/resources/transformation/rules/documentdb-rule1.yaml"), | ||
Paths.get("src/test/resources/transformation/rules/documentdb-rule.yaml") | ||
); | ||
doReturn(mockRuleFiles).when(transformersFactory).getRuleFiles(); | ||
assertTrue(mockRuleFiles.size() > 0, "There should be at least one rule file."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You defined mockRuleFiles
and assert it's not empty. Does it actually test the method?
// Get the URI of the rules folder | ||
URI uri = null; | ||
try { | ||
uri = getClass().getClassLoader().getResource("rules").toURI(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a predefined directory structure to ensure that we are only getting the transform rules:
org/opensearch/dataprepper/transforms/
Thus, we'd have:
org/opensearch/dataprepper/transforms/rules/
org/opensearch/dataprepper/transforms/templates/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way i understand this comment and this comment, we want to move the template and rules to be plugin specific and move this into data-prepper-plugin directory, ie, each plugin gets a rule and template but there are some limitations to this approach. If for example, there is an intent by an user for which plugin does not exist and we want to tranform that to other processors, where do we place the rules and templates in this case. Example for this is with respect to security lake.
} | ||
} else { | ||
// File is not inside a JAR | ||
rulesFolderPath = Paths.get(uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this ever occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to address 2 cases:
- "jar" scheme: This case occurs when the code is running from within a JAR file, ie during runtime.
- Non-"jar" scheme: This case occurs when the code is running in a development environment where the rules directory is available in the file system and not within a jar.
public InputStream getPluginRuleFileStream(String pluginName) { | ||
if(pluginName == null || pluginName.isEmpty()){ | ||
throw new RuntimeException("Transformation plugin not found"); | ||
public List<Path> getRuleFiles() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API contract lists out Path
s for rules. Then the caller comes back and get the input stream by calling readRuleFile
. You should be able to consolidate these.
Just make one method: Collection<InputStream> loadRules()
.
I do see that you use the file name later, I think for debugging purposes. You could make a class that holds both a name and an InputStream.
class RuleInputStream {
private String name;
private InputStream inputStream
}
@AllArgsConstructor | ||
@Data | ||
public class RuleFileEvaluation { | ||
Boolean result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should all be private. And I think they can also be private.
@@ -1,3 +1,4 @@ | |||
plugin_name: "documentdb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is for testing, let's add test in the name. Maybe test-documentdb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srikanthjg do you want to make this small change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be inconsistent even within the same rule yaml. The apply_when condition for the rule yaml, we still define it as documentdb and it would looks as follows:
plugin_name: "documentdbtest"
apply_when:
- "$..source.documentdb"
We still want use documentdb and not documentdbtest in the apply_when condition as that is what is used in the pipeline yaml; and that cannot be mocked that easily.
The plugin name in the rule yaml is only used to get the corresponding template, but it seems inconsistent if we use documentdb vs documentdbtest even within the same file for the same plugin we want to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't this work?
plugin_name: "documentdbtest"
apply_when:
- "$..source.documentdbtest"
These files are only used for unit testing, right? So, there is no need to keep the names matching the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will need to change like 20 files(test yaml files) in the test as all of them use documentdb. i cant just change the rule to documentdbtest and make it work. The userconfig, expected output and template depends on this. can i add this as another PR? it should it mostly replace all but i see some other issues with it.
@@ -1,2 +1,3 @@ | |||
plugin_name: "documentdb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above, please add test
into the name.
break; | ||
} | ||
} catch (PathNotFoundException e) { | ||
LOG.debug("Json Path not found for {}", ruleFile.getFileName().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call .toString()
here.
Map sourceOptions = new HashMap<String, Object>(); | ||
Map s3_bucket = new HashMap<>(); | ||
Map<String, Object> sourceOptions = new HashMap<>(); | ||
Map<String, Object> s3_bucket = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use correct naming conventions: s3Bucket
.
sourceOptions.put("option2", null); | ||
final PluginModel source = new PluginModel("http", sourceOptions); | ||
Map<String, Object> sourceOptions = new HashMap<>(); | ||
Map<String, Object> s3_bucket = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about variable naming conventions.
@@ -1,3 +1,4 @@ | |||
plugin_name: "documentdb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this into the mongodb
plugin project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srikanthjg Please address David's other comments. I will approve again and we can ask Hai to review too. |
@srikanthjg I am not sure if we are going to have a "real" plugin for each of the plugins in the pre-transformation config file. So, while moving documentdb related rules and templates to document db directory is a good suggestion by David, it might not be possible in all cases. For now, I suggest we just commit what you implemented in this PR and if needed, we can take the approach suggested by David in a future PR |
…tion Signed-off-by: Srikanth Govindarajan <srikanthjg123@gmail.com>
Signed-off-by: Srikanth Govindarajan <srigovs@amazon.com>
Signed-off-by: Srikanth Govindarajan <srigovs@amazon.com>
Signed-off-by: Srikanth Govindarajan <srigovs@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this empty file. You don't need a build.gradle
file at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build.gradle is needed to this plugin as a module. When i tried without build.gradle, classloader search was not picking up the rule file from the folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a README.md
to this project to explain how to use it and test with it.
} catch (FileNotFoundException e){ | ||
LOG.error("Template File Not Found for {}", PLUGIN_NAME); | ||
} catch (FileNotFoundException e) { | ||
LOG.error("Template File Not Found"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove the additional information for tracking the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add it back.
} | ||
} | ||
|
||
} catch (FileNotFoundException e) { | ||
LOG.debug("Rule File Not Found"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an error?
Also, can you include the file name in the log?
rulesPath = Paths.get(rulesURL.toURI()); | ||
} catch (FileSystemNotFoundException e) { | ||
// Handle the case where the file system is not accessible (e.g., in a JAR) | ||
FileSystem fileSystem = FileSystems.newFileSystem(rulesURL.toURI(), Collections.emptyMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to work? These will always be in jar files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this works. It will always hit this code when dataprepper is running because it is always in jar during execution but will differ when unit testing.
@@ -64,7 +65,7 @@ public RuleEvaluatorResult isTransformationNeeded(PipelinesDataFlowModel pipelin | |||
.build(); | |||
} | |||
} catch (FileNotFoundException e) { | |||
LOG.error("Template File Not Found"); | |||
LOG.error("Template File Not Found for {}", PLUGIN_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you don't even need this catch block. You can remove it. And then go back to using a local pluginName
variable.
For Example: | ||
|
||
User Config: | ||
```aidl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```aidl | |
```yaml |
on the template. | ||
|
||
Rule: | ||
```aidl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```aidl | |
```yaml |
``` | ||
|
||
Template: | ||
```aidl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```aidl | |
```yaml |
``` | ||
|
||
Output: | ||
```aidl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```aidl | |
```yaml |
Signed-off-by: Srikanth Govindarajan <srigovs@amazon.com>
Description
Dynamically detects rules present in rules folder and applies a corresponding transformation based on plugin name.
Issues Resolved
Resolves #4600
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.