Skip to content

Latest commit

 

History

History
333 lines (219 loc) · 18.3 KB

03_statistical_processing.md

File metadata and controls

333 lines (219 loc) · 18.3 KB

Back to index

Statistical Processing

Index

01.-What is a data series? (01:29)

02.-Chart Command (06:57)

03.-Timechart Command (07:21)

04.-Top Command (04:00)

05.-Rare Command (00:27)

06.-Stats Command (02:52)

07.-Functions of the stats Command (06:56)

08.-Transforming commands Summary (01:18)

09.-Eval Command (06:43)

10.-Functions of the Eval Command (04:24)

11.-Eval as a Function (01:06)

12.-Rename Command (02:38)

13.-Sort Command (03:11)

What is a data series

A data series is a sequence of related data points that are plotted in a visualization:

  • Single Series: Compares values of a simple data category.
  • Multi-Series: Compares values of two or more data categories.
  • Time-Series: Compares values over time, which can be either single or multi-series

Transforming commands can be used in searches to organize your results into a statistical table containing a data series that can be visualized. (Chart, timechart, top, rare, stats) We will learn how transforming commands can be used to structure searches to generate the results you need for the visualization you want.

Back to top

Chart Command

Take results and return them formatted into a table that can be displayed as a visualization.

image

  • Y-axis is defined with stats-func(field). where stats-func is a supported statistical function:

image

  • field is a field with numeric values.

  • over row-spli specifies the X-axis and defines the first column in our resulting table. Creates a single series.

  • by column-split Further split data, resulting in multi-series data-series.

  • Further control of our results.

    • span (with categorical fields). By default, Splunk will display individual columns for the top 10 values found in the field used to execute multi-series split (the field after by)
    • span (with numerical fields). will group the event into buckets. Splunk Shift overlapped values to the higher grouping. 400 falls into 400-500, instead of 300-400.
    • limit: overrides that top 10 to whatever whole value you find appropriate. Less frequent values than those indicated in limit argument get aggregated into the other column. ´limitargument sets a limit across the entire dataset.limit=0` means no limit at all. When there are too many values to display inside the legend, the list will include a down arrow to scroll through the values.
    • useother=True/False: Visually Removes the OTHERcolumn. There is no recalculation or research
    • usenull: Removes the NULLColumn if one exists. it is for events that do not contain the field used to create the multi-series series

Each time you invoke the stats command, you can use one or more functions. However, you can only use one BY clause. unlike the stats command, the chart command can only be split over two fields or dimensions. A chart command with two arguments after the by clause is equivalent to using an over and by clause. The values for a second split are represented by individually colored columns.

Some examples:

image image image image

Back to top

Timechart Command

Performs stats aggregations against time and returns a time series chart or table where _time field is always the X-axis.

image

stats-func(field) populates the Y-axis. count is the only function that does not require field specification. Function and argument used in Stat and chart can also be used with timechart.

by <split-by-field> spit our result table. A key difference between chart and timechart is that timechart only supports a single additional split. This is because the X-axis is automatically segmented or bucketed based on time. Each distinct value of the split-by-field will become a series.

The functional equivalent of the search, using chart would be chart count by time and usage, but timechart automatically applies a bucket command to set the time span to a preset sampling interval that depends on the time range of the search. We can see this reflected in the stats table output. Each row represents a chunk or bucket of aggregated data

image

Default bin span according to the picker time range.

Time range Default time bucket
last 30 days 1 day
last 7 days 1 day
last 24 hours 30 minutes
last hour 1 minute
last 15 minutes 10 secodns

When the period Spluck uses is not appropriate, you can override it using the span argument, which forces SPlunk to group bucket on the best-fit time range.

image

limit argument controls the number of values returned for our multi-series split. Without it, we get the top 10 values in 11 lines in our chart, which has an additional other series.

image

compared with

image

chart versus timechart

image There is no time aggregation

image Automatically aggregates count in one-day buckets because the time range is 7 days.

When running a multi-series time chart, we have an option for how we want our data to be displayed. As our timechart can sometimes appear more cluttered, we have the option to toggle a feature called multi-series mode. This option shows in format General Multi-series mode. This will separate out the different series into their own trendline sharing the same X-axis, but with individual Y-axis that share the floor and ceiling values.

image

Back to top

Top Command

Finds the most common values from a given list of fields in a result set. We can group results together based on a shared field with the by clause.

image

By default output top 10 results in table format. This can be overridden with the limit argument countfield': Renames the count field specifying a string. 'showperc', Defaults to True. showperf=for showperc=0` prevents the percentage column.

examples

Which IP addresses generated the most attacks in the last 60 minutes? Without any argument, we get the count or number of events and the percentage. image

With only one field: 10 most common values in the Grout field. Column values are unique. image

With two fields. All combinations of group and name are unique image

two top names by group image

Two top groups by name image

Renaming Count Column image

run top command from the UI:

image Note that when you use this method the limit argument is set by default to 20 image

Back to top

Rare Command

Essentially, is the opposite of the top command, which returns the least common values of a result set.. Has the same options., By default, results are sorted in ascending order based on count. image

Back to top

Stats Command

We produce statistics from our search results with the stats command. The output is a table. by <field list> clause groups the result for each different value of each field in the field list. Differently from chart and timechart, stats allows continuous splitting of your data. (timechart splits only by one field. chart command splits by two fields).

image

Use as clause to rename the resulting column to override default column names according to search syntax. It is a very convenient option to avoid confusion when statistics for several fields are calculated.

image

in stats statistical functions can support multiple fields image

count differs from count(field) in that the former counts all events and the latter counts only events with a value in the field. image

The order of fields in by <field list> has a big impact on the search results as the data will first be grouped by the first field given, then grouped by the second field given, and so on.

Back to top

Functions of the Stats Command

There are four categories of statistical functions:

  • Aggregate: Summarizes event values to create a single value

    • count, count(x), dc(x) or distinct_count(x), estdc(x), and estdc_error(x) estimated count of the distinct values in the field specified.
    • min(x), max(x), range(x).
    • sum(x), sumsq(x). Sum and sum squares
    • avg(x), median(x), mode(x). Average ignores events without an specific value or without numeric values in the field
    • stdev(x), stdevp(x),var(x)´, varp(x)´.
    • percentile<percentile>(x) or p<percentile>(x), upperperc<percentile>(x), exactperc<percentile>(x)
  • Event Order: Returns values from fields based on processing order.

    • The first(x) seen value in the field x.
    • The lastst(x) seen value in the field x.
  • Multivalue: Returns a list of values for a field.

    • list(x): Returns a list of up to 100 values in a field as a multivalue entry. The order of the values reflects the order of input events.
    • values(x): Returns the list of all distinct values in a field as a multivalue entry. The order of the values is lexicographical.
    • image
  • Time: Returns values based on time.

    • Returns the chronologically earliest(x) (oldest) or latest(x) (most recent) seen occurrence of a value in a field.
    • Returns the UNIX time of the earliest_time(x) (oldest) or latest_time(x) (most recent) to calculate the rate of increase for an accumulating counter.

image

Back to top

Transforming Commands Summary

stats, chart, and timechart share similar features. Use proper command to get wanted results: stats for the table, and the others for visualizations.

image

Back to top

Eval Command

Performs calculations with values in our data. An eval expression is a combination of literals. fields, operators, and functions that represent the values of the destination field. Calculates the expression and puts the resulting values into a new field or overwrites an existing one. Creates a new field on the fly, populated with the expression's result, that can be used as any regular field in the remainder of the search expression. Nothing written with the eval command is kept after the lifetime of the search it was used in. The new field is not saved in the index nor it will be available again after the search is completed. Either that the eval command temporally can overwrite the values present in a previously existing field, no change of our data is permanent. New values are not written to disk anyway.

image

Eval involves:

  • Mathematical operation.
  • String concatenation. Use + for String or character and . for any data type.
  • Comparison expression.
  • Boolean expressions.
  • A call to an eval function. These are the operators image

Eval Syntax

Field values are case-sensitive Strings must be double-quoted Field names must be single-quoted when contain special characters image

Usage examples

imageimageimage image image image

Back to top

Functions of the Eval Command

There are 11 categories of evaluation functions

  • Comparison & Conditional
  • Conversion
  • Cryptographic (md5, sha1, sha256, sha512)
  • Data & time
  • Informational
  • JSON
  • Mathematical (round(X,Y), pow). A round without Y returns X as an integer.
  • Multivalue
  • Statistical (avg, max, min, random). random returns a Pseudo-random integer ranging from zero to 2³¹-1
  • Text
  • Trigonometry and hyperbolic

Generally, Evaluation functions will evaluate an expression based on the events and return a result, but some do not evaluata any expression and instead return a result based on its own functionality

Examples

Create five random groups of users.

image

Eval usage without functions.

image

Back to top

Eval as a Function

The eval command can be used as a function within the stats command. Nest eval inside a stats count to count events with a calculated value.

image

Requires an as clause to rename the field.

Back to top

Rename Command

Helps to display a more useful or meaningful field name.

image

Double quote the new field name if has to contain any special character. Multiple field renaming is possible in a single command. notice commas image

Once a field is renamed Splunk will only respond to or be able to use the new name of that field in the rest of the search.

image

Wildcare usage in renaming fields

image

Back to top

Sort Command

Sorts in ascending order by default. - changes it to descending order. + is implicit so not required. Double quote field name when containing special characters or white spaces. limit the number of results with the limit argument or just put an integer.

image

Splunk determines the data type of values present in the field and sorts appropriately:

  • Alphabetic Strings: Lexicographically. uppercase letters before lowercase
  • numbers: Numerically
  • Combination: Depending on the first character lexicographically or numerically.

Examples

Please notice the white spaces after -, it applies to both fields. In the first example, orders descend on both of them. But in the second example, without white space, - only applies to the first field being the second field ordered in ascending mode. imageimage

Back to top

Back to index