ENH: support plugin loading in conifg #4974

chenqi0805 · 2024-09-24T05:29:59Z

Description

This PR

introduces @UsesDataPrepperPlugin annotation for PluginModel attribute in plugin config
validate pluginModel name against allowed pluginNames in UsesDataPrepperPlugin::pluginType by reflection
generate anyOf schemas for embedded pluginModel attribute to represent enum schemas

e.g. with the change in the PR, the aggregate processor schema will look like

{
  "$schema" : "https://json-schema.org/draft/2020-12/schema",
  "type" : "object",
  "properties" : {
    "identification_keys" : {
      "description" : "An unordered list by which to group events. Events with the same values as these keys are put into the same group. If an event does not contain one of the identification_keys, then the value of that key is considered to be equal to null. At least one identification_key is required (for example, [\"sourceIp\", \"destinationIp\", \"port\"].",
      "minItems" : 1,
      "type" : "array",
      "items" : {
        "type" : "string"
      }
    },
    "group_duration" : {
      "type" : "string",
      "format" : "duration",
      "description" : "The amount of time that a group should exist before it is concluded automatically. Supports ISO_8601 notation strings (\"PT20.345S\", \"PT15M\", etc.) as well as simple notation for seconds (\"60s\") and milliseconds (\"1500ms\"). Default value is 180s."
    },
    "action" : {
      "anyOf" : [ {
        "type" : "object",
        "properties" : {
          "tail_sampler" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "condition" : {
                "type" : "string",
                "description" : "A Data Prepper [conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as '/some-key == \"test\"', that will be evaluated to determine whether the event is an error event or not"
              },
              "percent" : {
                "type" : "integer",
                "description" : "Percent value to use for sampling non error events. 0.0 < percent < 100.0"
              },
              "wait_period" : {
                "type" : "string",
                "format" : "duration",
                "description" : "Period to wait before considering that a trace event is complete"
              }
            },
            "required" : [ "percent", "wait_period" ]
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "rate_limiter" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "events_per_second" : {
                "type" : "integer",
                "description" : "The number of events allowed per second."
              },
              "when_exceeds" : {
                "type" : "string",
                "description" : "Indicates what action the rate_limiter takes when the number of events received is greater than the number of events allowed per second. Default value is block, which blocks the processor from running after the maximum number of events allowed per second is reached until the next second. Alternatively, the drop option drops the excess events received in that second. Default is block"
              }
            },
            "required" : [ "events_per_second" ]
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "put_all" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "name" : {
                "type" : "string"
              },
              "pipeline_name" : {
                "type" : "string"
              },
              "process_workers" : {
                "type" : "integer"
              },
              "settings" : {
                "type" : "object"
              }
            }
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "histogram" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "buckets" : {
                "description" : "A list of buckets (values of type double) indicating the buckets in the histogram.",
                "type" : "array",
                "items" : {
                  "type" : "number"
                }
              },
              "generated_key_prefix" : {
                "type" : "string",
                "description" : "Key prefix used by all the fields created in the aggregated event. Having a prefix ensures that the names of the histogram event do not conflict with the field names in the event."
              },
              "key" : {
                "type" : "string",
                "description" : "Name of the field in the events the histogram generates."
              },
              "metric_name" : {
                "type" : "string",
                "description" : "Metric name to be used when otel format is used."
              },
              "output_format" : {
                "type" : "string",
                "description" : "Format of the aggregated event. otel_metrics is the default output format which outputs in OTel metrics SUM type with count as value. Other options is - raw - which generates a JSON object with the count_key field as a count value and the start_time_key field with aggregation start time as value."
              },
              "record_minmax" : {
                "type" : "boolean",
                "description" : "A Boolean value indicating whether the histogram should include the min and max of the values in the aggregation."
              },
              "units" : {
                "type" : "string",
                "description" : "The name of units for the values in the key. For example, bytes, traces etc"
              }
            },
            "required" : [ "buckets", "key", "units" ]
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "count" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "count_key" : {
                "type" : "string",
                "description" : "Key used for storing the count. Default name is aggr._count."
              },
              "end_time_key" : {
                "type" : "string",
                "description" : "Key used for storing the end time. Default name is aggr._end_time."
              },
              "metric_name" : {
                "type" : "string",
                "description" : "Metric name to be used when otel format is used."
              },
              "output_format" : {
                "type" : "string",
                "description" : "Format of the aggregated event. otel_metrics is the default output format which outputs in OTel metrics SUM type with count as value. Other options is - raw - which generates a JSON object with the count_key field as a count value and the start_time_key field with aggregation start time as value."
              },
              "start_time_key" : {
                "type" : "string",
                "description" : "Key used for storing the start time. Default name is aggr._start_time."
              },
              "unique_keys" : {
                "description" : "List of unique keys to count.",
                "type" : "array",
                "items" : {
                  "type" : "string"
                }
              }
            }
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "percent_sampler" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "percent" : {
                "type" : "number",
                "description" : "The percentage of events to be processed during a one second interval. Must be greater than 0.0 and less than 100.0"
              }
            },
            "required" : [ "percent" ]
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "remove_duplicates" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "name" : {
                "type" : "string"
              },
              "pipeline_name" : {
                "type" : "string"
              },
              "process_workers" : {
                "type" : "integer"
              },
              "settings" : {
                "type" : "object"
              }
            }
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      }, {
        "type" : "object",
        "properties" : {
          "append" : {
            "$schema" : "https://json-schema.org/draft/2020-12/schema",
            "type" : "object",
            "properties" : {
              "keys_to_append" : {
                "description" : "A list of keys to append to for the aggregated result.",
                "type" : "array",
                "items" : {
                  "type" : "string"
                }
              }
            }
          }
        },
        "description" : "The action to be performed on each group. One of the available aggregate actions must be provided."
      } ]
    },
    "local_mode" : {
      "type" : "boolean",
      "description" : "When local_mode is set to true, the aggregation is performed locally on each Data Prepper node instead of forwarding events to a specific node based on the identification_keys using a hash function. Default is false."
    },
    "output_unaggregated_events" : {
      "type" : "boolean",
      "description" : "A boolean indicating if the unaggregated events should be forwarded to the next processor/sink in the chain."
    },
    "aggregated_events_tag" : {
      "type" : "string",
      "description" : "Tag to be used for aggregated events to distinguish aggregated events from unaggregated events."
    },
    "aggregate_when" : {
      "type" : "string",
      "description" : "A Data Prepper [conditional expression](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as '/some-key == \"test\"', that will be evaluated to determine whether the processor will be run on the event."
    }
  },
  "required" : [ "identification_keys", "action", "local_mode" ],
  "description" : "The `aggregate` processor groups events based on the values of identification_keys. Then, the processor performs an action on each group, helping reduce unnecessary log volume and creating aggregated logs over time.",
  "name" : "aggregate",
  "documentation" : "https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/aggregate/"
}

Issues Resolved

Resolves #4838

Check List

New functionality includes testing.
New functionality has a documentation issue. Please link to it in this PR.
- New functionality has javadoc added
Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: George Chen <qchea@amazon.com>

chenqi0805 added 7 commits September 21, 2024 20:23

ADD: initial impl on resolving target type

9620bc9

Signed-off-by: George Chen <qchea@amazon.com>

ENH: validate embedded pluginModel and generated embedded schemas

35e0c9c

Signed-off-by: George Chen <qchea@amazon.com>

MAINT: backfill annotations

4030ee1

Signed-off-by: George Chen <qchea@amazon.com>

STY: fix errors

80ef670

Signed-off-by: George Chen <qchea@amazon.com>

Merge branch 'main' into enh/4838-support-plugin-loading-in-conifg

a4d4a08

Signed-off-by: George Chen <qchea@amazon.com>

MAINT: fix dependency

1eaa56d

Signed-off-by: George Chen <qchea@amazon.com>

FIX: bump reflection version

254b737

Signed-off-by: George Chen <qchea@amazon.com>

chenqi0805 marked this pull request as ready for review September 24, 2024 19:47

chenqi0805 requested review from engechas, graytaylor0, dinujoh, kkondaka, KarstenSchnitter, dlvenable and oeyh as code owners September 24, 2024 19:47

shenkw1 approved these changes Sep 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: support plugin loading in conifg #4974

ENH: support plugin loading in conifg #4974

chenqi0805 commented Sep 24, 2024 •

edited

Loading

ENH: support plugin loading in conifg #4974

Are you sure you want to change the base?

ENH: support plugin loading in conifg #4974

Conversation

chenqi0805 commented Sep 24, 2024 • edited Loading

Description

Issues Resolved

Check List

chenqi0805 commented Sep 24, 2024 •

edited

Loading