Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a decoder processor to decode Event keys #3841

Open
1 of 3 tasks
graytaylor0 opened this issue Dec 11, 2023 · 6 comments
Open
1 of 3 tasks

Create a decoder processor to decode Event keys #3841

graytaylor0 opened this issue Dec 11, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request plugin - processor A plugin to manipulate data in the data prepper pipeline.

Comments

@graytaylor0
Copy link
Member

graytaylor0 commented Dec 11, 2023

Is your feature request related to a problem? Please describe.
As a user of Data Prepper, my Events contain keys that are encoded in different formats, such as gzip, base64, and protobuf (https://protobuf.dev/programming-guides/encoding/).

Sample Event

{
  "my_protobuf_key": "",
  "my_gzip_key": "H4sIAAAAAAAAA/NIzcnJVyjPL8pJAQBSntaLCwAAAA==",
  "my_base64_key": "SGVsbG8gd29ybGQ="
}

Describe the solution you'd like
A new processor called a decoder processor that can decode various encodings. The following configuration example would decode the three values in the example Event above

processor:
  - decoder: 
       key: "my_base64_key"
       # Can be one of gzip, base64, or protobuf
       base64:
  - decoder:
       key: "my_gzip_key"
       gzip:
  - decoder:
       key: "my_protobuf_key"
       protobuf:
          message_definition_file: "/path/to/proto_definition.proto"

Tasks

@graytaylor0 graytaylor0 added enhancement New feature or request untriaged plugin - processor A plugin to manipulate data in the data prepper pipeline. labels Dec 11, 2023
@dlvenable dlvenable changed the title Create a decoder processor to decode encrypted Event keys Create a decoder processor to decode Event keys Dec 11, 2023
@dlvenable
Copy link
Member

It may be advantageous to have a different processors for different encodings for a few reasons.

  1. Each of these brings in different dependencies and long-term we may want to make some of these plugins optional to keep the overall size of the project down.
  2. This could produce simpler YAML as we won't need to the nested group for custom configurations.
  3. It remains consistent with other processors. We already can decode/parse JSON, CSV, and now ION. They have their own processors.
processor:
  - decode_protobuf:
       key: my_prototbuf_key
       message_definition_file: "/path/to/proto_definition.proto"
  - decode_base64:
       key: my_base64_key

Compression might be a special case. Maybe we'd have a single processor for that. Though it wouldn't help with overall dependency reduction.

processor:
  - decompress:
       key: my_gzip_key
       type: gzip

@kkondaka
Copy link
Collaborator

I think we need to support "/path/to/proto_definition.proto" to be S3 path as well, right?

@dlvenable
Copy link
Member

@kkondaka ,

I think we need to support "/path/to/proto_definition.proto" to be S3 path as well, right?

That is probably ideal, though it could also come as a follow-on based on feedback. Also I think we need to make a more general structure for getting data from S3, file path, etc. The current approach is rather cumbersome both for users and developers. We could do something similar to what we did with AWS Secrets and hopefully will do with environment variables.

@kkondaka
Copy link
Collaborator

For protobuf decoding, what's expected format of the file /path/to/proto_definition.proto? Is it supposed to contain the definition of the messages, something like,

syntax = "proto2";

message ProtoBufMessage {
  // Define your message fields here
  // Example:
  required int32 intField = 1;
  required string strField = 2;
}

I think it would be difficult to support such cases because such files need to be compiled.

It looks like if the above file compiled and a descriptor file is created using the following command

protoc --descriptor_set_out=MyMessage.desc MyMessage.proto

Then using the file MyMessage.desc in path config in the DataPrepper message_definition_file configuration could work.

@dlvenable
Copy link
Member

@kkondaka , What exactly is the descriptor in this proposal? Is it the "File descriptor" JSON in the following documentation?

https://protobuf.com/docs/descriptors#message-descriptors

@dlvenable
Copy link
Member

We should also consider how to handle Protobuf imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request plugin - processor A plugin to manipulate data in the data prepper pipeline.
Projects
Development

No branches or pull requests

3 participants