Skip to content

This pack contains a set of pre-processing pipelines for sources dedicated to microservices logs. This includes popular engines including Docker, Kubernetes, and Pivotal Cloud Foundry (PCF).

License

Notifications You must be signed in to change notification settings

criblpacks/cribl-microservices-logs-preprocessing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Microservices Logs Pre-processing Pipelines


This pack contains a set of pre-processing pipelines for sources dedicated to microservices logs. This includes popular engines including Docker, Kubernetes, and Pivotal Cloud Foundry (PCF). Furthermore, there are pipelines for both JSON format and CRIO/Containerd formatted logs. Lastly, there are also pipelines for multi-line logs.

List of pipelines and what they do

  • JSON_single_line_events Use as preferred pipeline for single line JSON events. This uses a parser function and minimal regex for optimized performance
  • JSON_Multiline_via_Regex-Extract Use when you know there are multi-line events in your logs. Ideally, you have a dedicated HTTP token or source for this dataset to miminize regular expressions and maximize throughput. Multiline logs might look like this:
{"log": "2021-02-09 11:19:04.816 ERROR [nio-8080-exec-4] x.x.x.exceptions.RestExceptionHandler: 422 Status Code - EntityNotFoundException - XXXXXXXXXXXXXXXXXXXXXXXXX", "stream": "stderr", "time": "2020-10-05T00:00:30.082640485Z"}
{"log": "xxxx.xxx.microservices.exceptions.EntityNotFoundException: XXXXXXXXXXXXXXXXXXXXXXXXX", "stream": "stderr", "time": "2020-10-05T00:00:30.082640485Z"}
{"log": "   at xxx.xxx.microservices.service.XXXXXXX.lambda$getXXXData$3(XYXYXYXYX.java:139) ~[classes!/:na]", "stream": "stderr", "time": "2020-10-05T00:00:30.082640485Z"}
{"log": "   at java.base/java.util.Optional.orElseThrow(Optional.java:408) ~[na:na]", "stream": "stderr", "time": "2020-10-05T00:00:30.082640485Z"}
  • JSON_Universal Alternative pipeline to the single-line and multi-line policies. This is heavy with regular expression, so use cautiously
  • PCF_Universal Universal pipeline for all single-line PCF events
  • CRI_Universal Universal pipeline for single-line or multi-line logs with CRIO format similar to this:
2021-02-09T11:19:04.815514933+00:00 stdout F 2021-02-09 11:19:04.814 DEBUG 1 --- [nio-8080-exec-8] x.x.x.service.DocumentService            : retrieved: []
2021-02-09T11:19:04.817387066+00:00 stdout P 2021-02-09 11:19:04.816 ERROR 1 --- [nio-8080-exec-4] x.x.x.exceptions.RestExceptionHandler    : 422 Status Code - EntityNotFoundException - XXXXXXXXXXXXXXXXXXXXXXXXX
2021-02-09T11:19:04.817387066+00:00 stdout P xxx.xxx.microservices.exceptions.EntityNotFoundException: XXXXXXXXXXXXXXXXXXXXXXXXX
2021-02-09T11:19:04.817387066+00:00 stdout F    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190) ~[spring-web-5.2.7.RELEASE.jar!/:5.2.7.RELEASE]

Requirements

Please be sure to perform the following:

  • Assign this pack as the pre-processing pipeline for any LogStream sources that process microservices logs. This can include Splunk HEC, Elastic, HTTP, TCP JSON.
  • As a best practice, create different tokens for different known datasets
    • (e.g. different token for containerized application generating web access logs, different token for applications using log4j, etc.)
    • This will help with simpler downstream filtering to apply the correct pipelines without repeated regular expressions.
  • At the source level, also create the following metadata fields to simplify downstream processing. Fields:
  1. containerization_engine. Assign it values such as 'kubernetes', 'k8s_crio', or 'docker_json'
  2. multiline. Assign this to the value: _raw.regmatch('\n.*\n') ? true : false
  3. Load event breaker rules for multi-line logs and to ensure the correct timestamp is processed. See next section.

Loading Event Breaker rules

Add the below event breaker rules:

  • Go to Knowledge | Event Breaker Rules | Add New | Advanced Mode
  • Copy the below JSON and paste into the newly created rule.
  • Apply one of these rules when capturing incoming data at the source to have it correctly parsed
  • Modify the event breaker rule to suit your timestamp format or deviations in your log format

Ruleset to import

{
  "id": "Multiline-TimestampAtBeginning",
  "lib": "custom",
  "rules": [
    {
      "condition": "_raw.match(`\"(log|message|msg|line)\":`)",
      "type": "regex",
      "timestampAnchorRegex": "/(log|message|msg|line)/",
      "timestamp": {
        "type": "auto",
        "length": 150
      },
      "timestampTimezone": "local",
      "timestampEarliest": "-420weeks",
      "timestampLatest": "+1week",
      "maxEventBytes": 51200,
      "disabled": false,
      "eventBreakerRegex": "/(?=\"(?:log|message|msg|line)\":\\s*\"\\s*\\d{4}-\\d{2}-\\d{2})/m",
      "name": "JSONFileFormat",
      "fields": [
        {
          "name": "multiline",
          "value": "/\\n.*\\n/.test(_raw) ? true : false"
        }
      ]
    },
    {
      "condition": "_raw.match(`[\\d\\-T:.+]+\\s+\\w+\\s+[FP]+`)",
      "type": "regex",
      "timestampAnchorRegex": "/^[\\d\\-T:.+]+\\s+\\w+\\s+[FP]\\s+/",
      "timestamp": {
        "type": "auto",
        "length": 150
      },
      "timestampTimezone": "local",
      "timestampEarliest": "-420weeks",
      "timestampLatest": "+1week",
      "maxEventBytes": 51200,
      "disabled": false,
      "eventBreakerRegex": "/^(?=[\\d\\-T:.+]+\\s+\\w+\\s+[FP]+\\s+\\d{4}-\\d{2}-\\d{2})/m",
      "fields": [
        {
          "value": "/\\n.*\\n/.test(_raw) ? true : false",
          "name": "multiline"
        }
      ],
      "name": "RFC3339Nano "
    },
    {
      "condition": "/^\\d{2}\\s+\\w{3}\\s+\\d{4}\\s+\\d{2}:\\d{2}:\\d{2}\\s+\\w+\\s+\\[[^\\]]+\\]/.test(_raw) || sourcetype=='java' || sourcetype=='tomcat' || sourcetype=='catalina.out'",
      "type": "regex",
      "timestampAnchorRegex": "/^/",
      "timestamp": {
        "type": "auto",
        "length": 150,
        "format": "%d\\s%b\\s%Y %H:%M:%S"
      },
      "timestampTimezone": "local",
      "timestampEarliest": "-1420weeks",
      "timestampLatest": "+1week",
      "maxEventBytes": 51200,
      "disabled": false,
      "eventBreakerRegex": "/(?=^\\d{2}\\s+\\w{3}\\s+\\d{4}\\s+\\d{2}:\\d{2}:\\d{2})/gm",
      "name": "Multline_Timestamp_atbeginning",
      "fields": [
        {
          "name": "multiline",
          "value": "/\\n.*\\n/.test(_raw) ? true : false"
        }
      ]
    }
  ],
  "description": "Multiline log formats"
}

End of ruleset to import
Note: subsequent releases of this Pack will include the event breaker rules when it's supported in the framework

Release Notes

Version 0.5 - 2021-06-28

First release. Pipelines for JSON & CRIO formats from docker, kubernetes, containerd, and crio. Includes pipelines for multi-line events Pipelines for Pivotal Cloud Foundry

Contributing to the Pack


Discuss this pack on our Community Slack channel #packs.

Contact


The author of this pack is Ahmed Kira and can be contacted at akira@cribl.io.

License


This Pack uses the following license: Apache 2.0.

About

This pack contains a set of pre-processing pipelines for sources dedicated to microservices logs. This includes popular engines including Docker, Kubernetes, and Pivotal Cloud Foundry (PCF).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published