-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from marcozac/perf-unmarshal
Improve `Unmarshal` performance
- Loading branch information
Showing
19 changed files
with
2,615 additions
and
143 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,6 +16,10 @@ go.work* | |
# GoReleaser | ||
dist/ | ||
|
||
# Temporary files | ||
tmp/ | ||
*.tmp | ||
|
||
# IDE | ||
.idea/ | ||
.vscode/* | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
# jsonc - JSON with comments for Go | ||
|
||
[](http://godoc.org/github.com/marcozac/go-jsonc) | ||
 | ||
[](https://github.com/marcozac/go-jsonc/actions/workflows/ci.yml) | ||
[](https://codecov.io/gh/marcozac/go-jsonc) | ||
[](https://goreportcard.com/report/github.com/marcozac/go-jsonc) | ||
|
||
`jsonc` is a light and dependency-free package for working with JSON with comments data built on top of `encoding/json`. | ||
It allows to remove comments converting to valid JSON-encoded data and to unmarshal JSON with comments into Go values. | ||
|
||
The dependencies listed in [go.mod](/go.mod) are only used for testing and benchmarking or to support [alternative libraries](#alternative-libraries). | ||
|
||
## Features | ||
|
||
- Full support for comment lines and block comments | ||
- Preserve the content of strings that contain comment characters | ||
- Sanitize JSON with comments data by removing comments | ||
- Unmarshal JSON with comments into Go values | ||
|
||
## Installation | ||
|
||
Install the `jsonc` package: | ||
|
||
```bash | ||
go get github.com/marcozac/go-jsonc | ||
``` | ||
|
||
## Usage | ||
|
||
### Sanitize - Remove comments from JSON data | ||
|
||
`Sanitize` removes all comments from JSON data, returning valid JSON-encoded byte slice that is compatible with standard library's json.Unmarshal. | ||
|
||
It works with comment lines and block comments anywhere in the JSONC data, preserving the content of strings that contain comment characters. | ||
|
||
#### Example | ||
|
||
```go | ||
package main | ||
|
||
import ( | ||
"encoding/json" | ||
|
||
"github.com/marcozac/go-jsonc" | ||
) | ||
|
||
func main() { | ||
invalidData := []byte(`{ | ||
// a comment | ||
"foo": "bar" /* a comment in a weird place */, | ||
/* | ||
a block comment | ||
*/ | ||
"hello": "world" // another comment | ||
}`) | ||
|
||
// Remove comments from JSONC | ||
data, err := jsonc.Sanitize(invalidData) | ||
if err != nil { | ||
... | ||
} | ||
|
||
var v struct{ | ||
Foo string | ||
Hello string | ||
} | ||
|
||
// Unmarshal using any other library | ||
if err := json.Unmarshal(data, &v); err != nil { | ||
... | ||
} | ||
} | ||
``` | ||
|
||
### Unmarshal - Parse JSON with comments into a Go value | ||
|
||
`Unmarshal` replicates the behavior of the standard library's json.Unmarshal function, with the addition of support for comments. | ||
|
||
It is optimized to avoid calling [`Sanitize`](#sanitize---remove-comments-from-json-data) unless it detects comments in the data. | ||
This avoids the overhead of removing comments when they are not present, improving performance on small data sets. | ||
|
||
It first checks if the data contains comment characters as `//` or `/*` using [`HasCommentRunes`](https://pkg.go.dev/github.com/marcozac/go-jsonc#HasCommentRunes). | ||
If no comment characters are found, it directly unmarshals the data. | ||
|
||
Only if comments are detected it calls [`Sanitize`](#sanitize---remove-comments-from-json-data) before unmarshaling to remove them. | ||
So, `Unmarshal` tries to skip unnecessary work when possible, but currently it is not possible to detect false positives as `//` or `/*` inside strings. | ||
|
||
Since the comment detection is based on a simple rune check, it is not recommended to use `Unmarshal` on large data sets unless you are not sure whether they contain comments. | ||
Indeed, `HasCommentRunes` needs to checks every single byte before to return `false` and may drastically slow down the process. | ||
|
||
In this case, it is more efficient to call [`Sanitize`](#sanitize---remove-comments-from-json-data) before to unmarshal the data. | ||
|
||
#### Example | ||
|
||
```go | ||
package main | ||
|
||
import "github.com/marcozac/go-jsonc" | ||
|
||
func main() { | ||
invalidData := []byte(`{ | ||
// a comment | ||
"foo": "bar" | ||
}`) | ||
|
||
var v struct{ Foo string } | ||
|
||
err := jsonc.Unmarshal(invalidData, &v) | ||
if err != nil { | ||
... | ||
} | ||
} | ||
``` | ||
|
||
## Alternative libraries | ||
|
||
By default, `jsonc` uses the standard library's `encoding/json` to unmarshal JSON data and has no external dependencies. | ||
|
||
It is possible to use build tags to use alternative libraries instead of the standard library's `encoding/json`: | ||
|
||
| Tag | Library | | ||
| ------------ | -------------------------------------------------------------------- | | ||
| none or both | standard library | | ||
| jsoniter | [`github.com/json-iterator/go`](https://github.com/json-iterator/go) | | ||
| go_json | [`github.com/goccy/go-json`](https://github.com/goccy/go-json) | | ||
|
||
## Benchmarks | ||
|
||
This library aims to have performance comparable to the standard library's `encoding/json`. | ||
Unfortunately, comments removal is not free and it is not possible to avoid the overhead of removing comments when they are present. | ||
|
||
Currently `jsonc` performs worse than the standard library's `encoding/json` on small data sets about 27% on data with comments in strings and 16% on data without comments. | ||
On medium data sets, the performance gap is increased to about 30% on data with comments in strings and reduced to 12% on data without comments. | ||
|
||
However, using one of the [alternative libraries](#alternative-libraries), it is possible to achieve better performance than the standard library's `encoding/json` even considering the overhead of removing comments. | ||
|
||
See [benchmarks](/benchmarks) for the full results. | ||
|
||
The benchmarks are run on a MacBook Pro (16-inch, 2021), Apple M1 Max, 32 GB RAM. | ||
|
||
## Contributing | ||
|
||
:heart: Contributions are ~~needed~~ welcome! | ||
|
||
Please open an issue or submit a pull request if you would like to contribute. | ||
|
||
To submit a pull request: | ||
|
||
- Fork this repository | ||
- Create a new branch | ||
- Make changes and commit | ||
- Push to your fork and submit a pull request | ||
|
||
## License | ||
|
||
This project is licensed under the Apache 2.0 license. See [LICENSE](/LICENSE) file for details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
// Copyright 2023 Marco Zaccaro. All Rights Reserved. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
//go:build uncommented_test | ||
// +build uncommented_test | ||
|
||
package jsonc | ||
|
||
import ( | ||
"testing" | ||
|
||
"github.com/marcozac/go-jsonc/internal/json" | ||
"github.com/stretchr/testify/assert" | ||
"github.com/stretchr/testify/require" | ||
) | ||
|
||
// This file does not contain real benchmarks, but it is used to compare the | ||
// performances over the standard functions on uncommented JSON data. | ||
|
||
// Check standard json.Unmarshal (or jsoniter / go-json / ...) performances | ||
// with uncommented JSON data. | ||
func BenchmarkUnmarshal(b *testing.B) { | ||
b.Run("Small", func(b *testing.B) { | ||
b.Run("UnCommented", func(b *testing.B) { | ||
benchmarkUnmarshal(b, _smallUncommented, Small{}) | ||
}) | ||
b.Run("NoCommentRunes", func(b *testing.B) { | ||
benchmarkUnmarshal(b, _smallNoCommentRunes, SmallNoCommentRunes{}) | ||
}) | ||
}) | ||
b.Run("Medium", func(b *testing.B) { | ||
b.Run("UnCommented", func(b *testing.B) { | ||
benchmarkUnmarshal(b, _mediumUncommented, Medium{}) | ||
}) | ||
b.Run("NoCommentRunes", func(b *testing.B) { | ||
benchmarkUnmarshal(b, _mediumNoCommentRunes, MediumNoCommentRunes{}) | ||
}) | ||
}) | ||
} | ||
|
||
func benchmarkUnmarshal[T DataType](b *testing.B, data []byte, dt T) { | ||
b.Helper() | ||
b.RunParallel(func(p *testing.PB) { | ||
for p.Next() { | ||
UnmarshalOK(b, data, dt) | ||
} | ||
}) | ||
} | ||
|
||
func UnmarshalOK[T DataType](t require.TestingT, data []byte, dt T) { | ||
j := dt | ||
assert.NoError(t, json.Unmarshal(data, &j), "unmarshal failed") | ||
FieldsValue(t, j) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Benchmark results | ||
|
||
The tables below show the performance of [`Unmarshal`](#unmarshal---parse-json-with-comments-into-a-go-value) compared to the standard library's `encoding/json` and other alternative libraries on small and medium data sets. | ||
|
||
They are formatted as follows: | ||
|
||
| Data set | s/op | B/op | allocs/op | | ||
| ------------- | ------------------------------------------- | ---- | --------- | | ||
| Set reference | result (Δ% on reference / reference result) | same | same | | ||
|
||
See the files in this directory for the full report. | ||
|
||
### Standard library | ||
|
||
The tables below show the performance of [`Unmarshal`](#unmarshal---parse-json-with-comments-into-a-go-value) compared to the standard library's `encoding/json` on small and medium data sets. | ||
|
||
| **Small data set** | s/op | B/op | allocs/op | | ||
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ---------------------- | | ||
| [With comments](../testdata/small.json) | 2.536µ | 1.344Ki | 22.00 | | ||
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 2.425µ (+27.17% / 1.907µ) | 1.219Ki (+14.71% / 1.062Ki) | 22.00 (+4.76% / 21.00) | | ||
| [Without comment characters](../testdata/small_no_comment_runes.json) | 2.306µ (+16.11% / 1.986µ) | 1.062Ki (~% / 1.062Ki) | 21.00 (~% / 21.00) | | ||
|
||
| **Medium data set** | s/op | B/op | allocs/op | | ||
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ------------------------ | | ||
| [With comments](../testdata/small.json) | 301.2µ | 324.7Ki | 1.067k | | ||
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 202.3µ (+30.86% / 154.6µ) | 148.7Ki (+60.41% / 92.70Ki) | 1.067k (+0.09% / 1.066k) | | ||
| [Without comment characters](../testdata/small_no_comment_runes.json) | 170.6µ (+11.63% / 152.8µ) | 92.70Ki (~% / 92.70Ki) | 1.066k (~% / 1.066k) | | ||
|
||
### With [`github.com/json-iterator/go`](https://github.com/json-iterator/go) | ||
|
||
| **Small data set** | s/op | B/op | allocs/op | | ||
| -------------------------------------------------------------------------------------- | ------------------------- | ----------------------- | ---------------------- | | ||
| [With comments](../testdata/small.json) | 1.632µ | 944.0 | 14.00 | | ||
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 1.702µ (+11.94% / 1.521µ) | 816.0 (+24.39% / 656.0) | 14.00 (+7.69% / 13.00) | | ||
| [Without comment characters](../testdata/small_no_comment_runes.json) | 1.603µ (~% / 1.598µ) | 656.0 (~% / 656.0) | 12.00 (~% / 13.00) | | ||
|
||
| **Medium data set** | s/op | B/op | allocs/op | | ||
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ------------------------ | | ||
| [With comments](../testdata/small.json) | 245.0µ | 407.8Ki | 3.484k | | ||
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 142.4µ (+42.25% / 100.1µ) | 231.8Ki (+31.90% / 175.7Ki) | 3.484k (+0.06% / 3.482k) | | ||
| [Without comment characters](../testdata/small_no_comment_runes.json) | 113.1µ (+17.45% / 96.32µ) | 175.7Ki (+0.01% / 175.7Ki) | 3.482k (~% / 3.482k) | | ||
|
||
### [`github.com/goccy/go-json`](https://github.com/goccy/go-json) | ||
|
||
| **Small data set** | s/op | B/op | allocs/op | | ||
| -------------------------------------------------------------------------------------- | ------------------------- | ----------------------- | ----------------------- | | ||
| [With comments](../testdata/small.json) | 1.794µ | 1.047Ki | 10.00 | | ||
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 1.797µ (+15.38% / 1.557µ) | 928.0 (+20.83% / 768.0) | 10.00 (+11.11% / 9.000) | | ||
| [Without comment characters](../testdata/small_no_comment_runes.json) | 1.705µ (+3.30% / 1.651µ) | 768.0 (~% / 768.0) | 9.00 (~% / 9.000) | | ||
|
||
| **Medium data set** | s/op | B/op | allocs/op | | ||
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ---------------------- | | ||
| [With comments](../testdata/small.json) | 213.1µ | 434.9Ki | 77.00 | | ||
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 101.4µ (+83.61% / 55.24µ) | 250.4Ki (+28.94% / 194.2Ki) | 73.00 (+2.82% / 71.00) | | ||
| [Without comment characters](../testdata/small_no_comment_runes.json) | 72.60µ (+37.97% / 52.62µ) | 194.2Ki (+0.02% / 194.1Ki) | 71.00 (~% / 71.00) | |
Oops, something went wrong.