From d0610fa9547361d762b727e01c3d20caf16392b6 Mon Sep 17 00:00:00 2001 From: Choko Date: Tue, 4 Jan 2022 20:29:06 +0900 Subject: [PATCH] Copy README from curiostack (#6) --- README.md | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) diff --git a/README.md b/README.md index 1c4756f..143b4a8 100644 --- a/README.md +++ b/README.md @@ -1 +1,127 @@ # protobuf-jackson + +protobuf-jackson is a library for efficient marshalling of Protocol Buffer messages to and from +JSON. It is based on the streaming API of the Jackson JSON handling library and uses Byte Buddy to +generate efficient bytecode at runtime for marshalling specific message types. + +This library requires Java 8 to use. + +## Why another Protocol Buffer <-> JSON marshaller + +The official Protocol Buffer Java library comes with a util library for marshalling to JSON, using +a class called JsonFormat. Unfortunately, JsonFormat is very slow, and its core design of using +manual JSON printers / GSON parsers and introspection during serialization time limit how fast it +can get. + +JSON marshalling is important - it can allow existing REST services to be migrated to Protocol +Buffers and makes it easier to evangelize their use in organizations currently tied heavily to JSON. +However, slow marshalling prevents these use cases from being practical, so this library aims to +allow JSON marshalling to be fast enough for production. It will never be as good as binary +serialization, but should be about as fast as standard REST API libraries. + +If the protoc compiler supports generating source code for JSON marshalling, this library will +likely be deprecated. + +## Usage + +Include protobuf-jackson as a dependency. Gradle users can add + +```kotlin +implementation("org.curioswitch.curiostack:protobuf-jackson:1.3.0") +``` + +to their dependencies. + +The API is exposed in ```MessageMarshaller```. Use its builder to configure options and register +specific Protocol Buffer message types you will marshall. Any other types will not be marshallable +by the built ```MessageMarshaller```. + +```java +class MessageApi { + + private static final MessageMarshaller marshaller = MessageMarshaller.builder() + .register(ApiRequest.getDefaultInstance()) + .register(ApiResponse.getDefaultInstance()) + .build(); + + String coolApi(String jsonRequest) { + ApiRequest.Builder request = ApiRequest.newBuilder(); + marshaller.mergeValue(jsonRequest, request); + + ApiResponse response = processRequest(request); + + return marshaller.writeValueAsString(response); + } +} +``` + +## Design + +protobuf-jackson introspects registered message types during bytecode generation time and generates +code specific to the type. Unlike JsonFormat, this means that during actually marshalling, there is +no introspection of the message type at all - it's already been done during registration. In +addition, messages are accessed through their standard Java generated API, no reflection is needed. +No reflection allows the JIT to optimize generated machine code as much as possible. By generating +bytecode, we approach the same speed that could be achieved by generating the source code itself. + +All of the same tests as ```JsonFormat``` (besides the differences listed below) pass, so +protobuf-jackson should be mostly compatible with upstream and ready for production. + +## Differences with upstream + +Differences with JsonFormat are + +- No support for ```DynamicMessage```. The library is designed to interact with generated code for +optimal performance. If you don't know what ```DynamicMessage``` is, you are not using it. If you +do, stick with ```JsonFormat```. + +- Does not support parsing Any messages where ```@type``` is not the first field. + +- Correctly handles invalid array syntax. ```JsonFormat``` does not due to its dependency on GSON. +See [JsonFormatTest](https://github.com/google/protobuf/blob/master/java/util/src/test/java/com/google/protobuf/util/JsonFormatTest.java#L1124). + +- Correctly rejects invalid JSON which specifies the same field twice. ```JsonFormat``` does not due +to its dependency on GSON. See [JsonFormatTest](https://github.com/google/protobuf/blob/master/java/util/src/test/java/com/google/protobuf/util/JsonFormatTest.java#L458) + +- Fields already present in the passed in ```Message.Builder``` will be overwritten, rather than +causing the parse to fail, as is usual with merge semantics. + +## Benchmarks + +First, a synthetic benchmark that uses the upstream ```TestAllTypes``` proto which contains all +types of fields. + +``` +Benchmark Mode Cnt Score Error Units +JsonParserBenchmark.codegenJson thrpt 50 114260.817 ± 1162.079 ops/s +JsonParserBenchmark.fromBinary thrpt 50 448586.946 ± 7946.499 ops/s +JsonParserBenchmark.upstreamJson thrpt 50 43392.884 ± 715.184 ops/s +JsonSerializeBenchmark.codegenJson thrpt 50 204768.672 ± 2459.787 ops/s +JsonSerializeBenchmark.toBytes thrpt 50 1071878.319 ± 12628.059 ops/s +JsonSerializeBenchmark.upstreamJson thrpt 50 49941.606 ± 707.045 ops/s +``` + +protobuf-jackson generally shows a 4x speed up in serialization and 3x speed up in parsing vs +upstream JSON parsing. It is about 4-5x slower than protobuf binary marshalling. + +For a more real-world example, the JSON response of the [GitHub search API](https://developer.github.com/v3/search/#example) +was used. A proto file corresponding to the format was made to model what an actual data exchange +may look like. This is just one example, and other examples may have different results, but +hopefully the trends are similar + +``` +Benchmark Mode Cnt Score Error Units +GithubApiBenchmark.marshallerParseString thrpt 50 1696.653 ± 170.891 ops/s +GithubApiBenchmark.marshallerWriteString thrpt 50 3042.433 ± 172.438 ops/s +GithubApiBenchmark.protobufParseBytes thrpt 50 4599.390 ± 56.570 ops/s +GithubApiBenchmark.protobufToBytes thrpt 50 21315.212 ± 210.659 ops/s +GithubApiBenchmark.upstreamParseString thrpt 50 472.902 ± 6.455 ops/s +GithubApiBenchmark.upstreamWriteString thrpt 50 717.227 ± 8.834 ops/s +``` + +For both parsing and serializing, protobuf-jackson is about 4x faster. + +One other way to look at these results is that JSON marshalling is now less than an order of +magnitude different than protobuf binary marshalling, which was not the case before. While binary +marshalling will always be preferred, for use cases that demand it, JSON should also be acceptable +in production.