High-performance filtering of to-be-logged JSON. Reads, filters and writes JSON in a single step - drastically increasing throughput (by ~3x-9x). Typical use-cases:
- Filter sensitive values from logs (i.e. on request-/response-logging)
- technical details like passwords and so on
- sensitive personal information, for GDPR compliance and such
- Improve log readability, filtering
- large String elements like base64-encoded binary data, or
- whole JSON subtrees with low informational value
- Reduce amount of data sent to log accumulation tools
- lower cost
- potentially reduce search / visualization latency
- keep within max log-statement size
- GCP: 256 KB
- Azure: 32 KB
Features:
- Truncate large text values
- Mask (anonymize) scalar values like String, Number, Boolean and so on.
- Remove (prune) whole subtrees
- Truncate large documents (max total output size)
- Skip or speed up filtering for remainder of document after a number of anonymize and/or prune hits
- Remove whitespace (for pretty-printed documents)
- Metrics for the above operations + total input and output size
The library contains multiple filter implementations as to accommodate combinations of the above features with as little overhead as possible. The equivalent filters are also implemented using Jackson.
Bugs, feature suggestions and help requests can be filed with the issue-tracker.
The project is built with Maven and is available on the central Maven repository.
Maven coordinates
Add the property
<json-log-filter.version>x.x.x</json-log-filter.version>
then add
<dependency>
<groupId>com.github.skjolber.json-log-filter</groupId>
<artifactId>api</artifactId>
<version>${json-log-filter.version}</version>
</dependency>
<dependency>
<groupId>com.github.skjolber.json-log-filter</groupId>
<artifactId>core</artifactId>
<version>${json-log-filter.version}</version>
</dependency>
and optionally
<dependency>
<groupId>com.github.skjolber.json-log-filter</groupId>
<artifactId>jackson</artifactId>
<version>${json-log-filter.version}</version>
</dependency>
or
Gradle coordinates
For
ext {
jsonLogFilterVersion = 'x.x.x'
}
add
api("com.github.skjolber.json-log-filter:api:${jsonLogFilterVersion}")
api("com.github.skjolber.json-log-filter:core:${jsonLogFilterVersion}")
and optionally
api("com.github.skjolber.json-log-filter:jackson:${jsonLogFilterVersion}")
Use a DefaultJsonLogFilterBuilder
or JacksonJsonLogFilterBuilder
to configure a filter instance (all filters are thread safe):
JsonFilter filter = DefaultJsonLogFilterBuilder.createInstance()
.withMaxStringLength(127) // cuts long texts
.withAnonymize("$.customer.email") // inserts ***** for values
.withPrune("$.customer.account") // removes whole subtree
.withMaxPathMatches(16) // halt anon/prune after a number of hits
.withMaxSize(128*1024)
.build();
byte[] json = ...; // obtain JSON
String filtered = filter.process(json); // perform filtering
Configure max string length for output like
{
"icon": "QUJDREVGR0hJSktMTU5PUFFSU1... + 46"
}
Configure anonymize for output like
{
"password": "*****"
}
for scalar values, and/or for objects / arrays all contained scalar values:
{
"credentials": {
"username": "*****",
"password": "*****"
}
}
Configure prune to turn input
{
"context": {
"boringData": {
...
},
"staticData": [ ... ]
}
}
to output like
{
"context": "PRUNED"
}
A simple syntax is supported, where each path segment corresponds to a field name
. Expressions are case-sensitive. Supported syntax:
/my/field/name
with support for wildcards;
/my/field/*
or a simple any-level field name search
//myFieldName
The filters within this library support using multiple expressions at once. Note that path expressions are see through arrays.
Configure max path matches; so that filtering stops after a number of matches. This means the filter speed can be increased considerably if the number of matches is known to be a fixed number; and will approach pass-through performance if those matches are in the beginning of the document.
For example if the to-be filtered JSON document has a schema definition with a header + body structure, and the target value is in the header.
Configure max size to limit the size of the resulting document. This reduces the size of the document by (silently) deleting the JSON content after the limit is reached.
Pass in a JsonFilterMetrics
argument to the process
method like so:
JsonFilterMetrics myMetrics = new DefaultJsonFilterMetrics();
String filtered = filter.process(json, myMetrics); // perform filtering
The resulting metrics could be logged as metadata alongside the JSON payload or passed to sensors like Micrometer for further processing, for example for
- Measuring the impact of the filtering, i.e. reduction in data size
- Make sure filters are actually operating as intended
The core
processors within this project are faster than the Jackson
-based processors. This is expected as parser/serializer features have been traded for performance:
core
is something like 3x-9x as fast asJackson
processors, where- skipping large parts of JSON documents (prune) decreases the difference, and
- small documents increase the difference, as
Jackson
is more expensive to initialize. - working directly on bytes is faster than working on characters for the
core
processors.
For a typical, light-weight web service, the overall system performance improvement for using the core
filters over the Jackson
-based filters will most likely be a few percent.
Memory use will be at 2-8 times the raw JSON byte size; depending on the invoked JsonFilter
method (some accept string, other raw bytes or chars).
See the benchmark results (JDK 17) and the JMH module for running detailed benchmarks.
There is also a path artifact which helps facilitate per-path filters for request/response-logging applications, which should further improve performance.
See the xml-log-filter for corresponding high-performance filtering of XML, and JsonPath for more advanced filtering.
Using SIMD for parsing JSON:
Alternative JSON filters:
- json-masker (included in benchmark).