You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The OpenSearch Piped Processing Language (PPL) currently lacks some advanced statistical aggregation capabilities similar to those provided by the eventstats command in Splunk Search Processing Language (SPL).
This feature request proposes adding new functions and syntax to PPL to enable statistical calculations and aggregations on event data.
Proposed Functionality:
Aggregate statistical calculations:
Calculate common statistical measures like sum, count, min, max, avg, etc., on specific fields or expressions.
Support grouping events by one or more fields and performing statistical calculations within each group.
Allow renaming the calculated fields with custom names.
Conditional aggregations:
Perform statistical calculations based on conditional expressions or filters.
Evaluate conditional expressions for each event and aggregate the results (e.g., sum of a conditional expression).
Chaining and nesting:
Enable chaining and nesting of statistical calculations, similar to how eventstats commands can be chained in SPL.
Allow performing multiple levels of aggregations and calculations in a single query.
Integration with existing PPL syntax:
Seamlessly integrate the new statistical aggregation capabilities with the existing PPL syntax and functions.
Ensure compatibility with other PPL features and maintain the overall usability and readability of the language.
Examples:
Calculate the sum of a conditional expression grouped by a field:
stats sum(if(field1 ="value"and field2 like"%pattern%", 1, 0)) as conditional_sum by group_field
Calculate minimum and maximum values of a field grouped by another field:
stats min(latency_field) as min_latency, max(latency_field) as max_latency by operation_id
Chain multiple statistical calculations:
stats sum(count) as total_count by client_id | stats sum(total_count) as overall_total
The text was updated successfully, but these errors were encountered:
Are the “Conditional aggregations” related to the general availability of the if statement of is it an
other “if” than the regular one -> https://github.com/opensearch-project/opensearch-
spark/issues/398 (which is now CASE)
Chaining (as in Example 3) seems to be referring to the regular chain of ppl commands in general,
right?
It seems that stats avg/sum/ etc is already supported/implemented as of now according to the docs, pls confirm
It seems that “by” grouping is already supported/implemented as of now according to the docs, pls confirm
Same for "Allow renaming the calculated fields with custom names."
So I am not exactly sure what the scope of this issue is because example 2+3 can be already executed successfully.
Example 3 can be executed when rewritten as CASE like ´stats sum(case(device-id = 'value1', 1, device-name = 'value2',2 else 1))`
High level Review
The OpenSearch Piped Processing Language (PPL) currently lacks some advanced statistical aggregation capabilities similar to those provided by the
eventstats
command in Splunk Search Processing Language (SPL).This feature request proposes adding new functions and syntax to PPL to enable statistical calculations and aggregations on event data.
Proposed Functionality:
Aggregate statistical calculations:
sum
,count
,min
,max
,avg
, etc., on specific fields or expressions.Conditional aggregations:
Chaining and nesting:
eventstats
commands can be chained in SPL.Integration with existing PPL syntax:
Examples:
The text was updated successfully, but these errors were encountered: