Skip to content

Commit

Permalink
Merge pull request #3 from marcozac/perf-unmarshal
Browse files Browse the repository at this point in the history
Improve `Unmarshal` performance
  • Loading branch information
marcozac authored Aug 5, 2023
2 parents 0936c96 + 9844e1d commit 0f0d9d7
Show file tree
Hide file tree
Showing 19 changed files with 2,615 additions and 143 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ go.work*
# GoReleaser
dist/

# Temporary files
tmp/
*.tmp

# IDE
.idea/
.vscode/*
Expand Down
157 changes: 157 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
# jsonc - JSON with comments for Go

[![Go Doc](https://img.shields.io/badge/godoc-reference-blue.svg)](http://godoc.org/github.com/marcozac/go-jsonc)
![License](https://img.shields.io/github/license/marcozac/go-jsonc?color=blue)
[![CI](https://github.com/marcozac/go-jsonc/actions/workflows/ci.yml/badge.svg)](https://github.com/marcozac/go-jsonc/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/marcozac/go-jsonc/branch/main/graph/badge.svg?token=JYj7gCZauN)](https://codecov.io/gh/marcozac/go-jsonc)
[![Go Report Card](https://goreportcard.com/badge/github.com/marcozac/go-jsonc)](https://goreportcard.com/report/github.com/marcozac/go-jsonc)

`jsonc` is a light and dependency-free package for working with JSON with comments data built on top of `encoding/json`.
It allows to remove comments converting to valid JSON-encoded data and to unmarshal JSON with comments into Go values.

The dependencies listed in [go.mod](/go.mod) are only used for testing and benchmarking or to support [alternative libraries](#alternative-libraries).

## Features

- Full support for comment lines and block comments
- Preserve the content of strings that contain comment characters
- Sanitize JSON with comments data by removing comments
- Unmarshal JSON with comments into Go values

## Installation

Install the `jsonc` package:

```bash
go get github.com/marcozac/go-jsonc
```

## Usage

### Sanitize - Remove comments from JSON data

`Sanitize` removes all comments from JSON data, returning valid JSON-encoded byte slice that is compatible with standard library's json.Unmarshal.

It works with comment lines and block comments anywhere in the JSONC data, preserving the content of strings that contain comment characters.

#### Example

```go
package main

import (
"encoding/json"

"github.com/marcozac/go-jsonc"
)

func main() {
invalidData := []byte(`{
// a comment
"foo": "bar" /* a comment in a weird place */,
/*
a block comment
*/
"hello": "world" // another comment
}`)

// Remove comments from JSONC
data, err := jsonc.Sanitize(invalidData)
if err != nil {
...
}

var v struct{
Foo string
Hello string
}

// Unmarshal using any other library
if err := json.Unmarshal(data, &v); err != nil {
...
}
}
```

### Unmarshal - Parse JSON with comments into a Go value

`Unmarshal` replicates the behavior of the standard library's json.Unmarshal function, with the addition of support for comments.

It is optimized to avoid calling [`Sanitize`](#sanitize---remove-comments-from-json-data) unless it detects comments in the data.
This avoids the overhead of removing comments when they are not present, improving performance on small data sets.

It first checks if the data contains comment characters as `//` or `/*` using [`HasCommentRunes`](https://pkg.go.dev/github.com/marcozac/go-jsonc#HasCommentRunes).
If no comment characters are found, it directly unmarshals the data.

Only if comments are detected it calls [`Sanitize`](#sanitize---remove-comments-from-json-data) before unmarshaling to remove them.
So, `Unmarshal` tries to skip unnecessary work when possible, but currently it is not possible to detect false positives as `//` or `/*` inside strings.

Since the comment detection is based on a simple rune check, it is not recommended to use `Unmarshal` on large data sets unless you are not sure whether they contain comments.
Indeed, `HasCommentRunes` needs to checks every single byte before to return `false` and may drastically slow down the process.

In this case, it is more efficient to call [`Sanitize`](#sanitize---remove-comments-from-json-data) before to unmarshal the data.

#### Example

```go
package main

import "github.com/marcozac/go-jsonc"

func main() {
invalidData := []byte(`{
// a comment
"foo": "bar"
}`)

var v struct{ Foo string }

err := jsonc.Unmarshal(invalidData, &v)
if err != nil {
...
}
}
```

## Alternative libraries

By default, `jsonc` uses the standard library's `encoding/json` to unmarshal JSON data and has no external dependencies.

It is possible to use build tags to use alternative libraries instead of the standard library's `encoding/json`:

| Tag | Library |
| ------------ | -------------------------------------------------------------------- |
| none or both | standard library |
| jsoniter | [`github.com/json-iterator/go`](https://github.com/json-iterator/go) |
| go_json | [`github.com/goccy/go-json`](https://github.com/goccy/go-json) |

## Benchmarks

This library aims to have performance comparable to the standard library's `encoding/json`.
Unfortunately, comments removal is not free and it is not possible to avoid the overhead of removing comments when they are present.

Currently `jsonc` performs worse than the standard library's `encoding/json` on small data sets about 27% on data with comments in strings and 16% on data without comments.
On medium data sets, the performance gap is increased to about 30% on data with comments in strings and reduced to 12% on data without comments.

However, using one of the [alternative libraries](#alternative-libraries), it is possible to achieve better performance than the standard library's `encoding/json` even considering the overhead of removing comments.

See [benchmarks](/benchmarks) for the full results.

The benchmarks are run on a MacBook Pro (16-inch, 2021), Apple M1 Max, 32 GB RAM.

## Contributing

:heart: Contributions are ~~needed~~ welcome!

Please open an issue or submit a pull request if you would like to contribute.

To submit a pull request:

- Fork this repository
- Create a new branch
- Make changes and commit
- Push to your fork and submit a pull request

## License

This project is licensed under the Apache 2.0 license. See [LICENSE](/LICENSE) file for details.
65 changes: 65 additions & 0 deletions benchmark_uncommented_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
// Copyright 2023 Marco Zaccaro. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

//go:build uncommented_test
// +build uncommented_test

package jsonc

import (
"testing"

"github.com/marcozac/go-jsonc/internal/json"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

// This file does not contain real benchmarks, but it is used to compare the
// performances over the standard functions on uncommented JSON data.

// Check standard json.Unmarshal (or jsoniter / go-json / ...) performances
// with uncommented JSON data.
func BenchmarkUnmarshal(b *testing.B) {
b.Run("Small", func(b *testing.B) {
b.Run("UnCommented", func(b *testing.B) {
benchmarkUnmarshal(b, _smallUncommented, Small{})
})
b.Run("NoCommentRunes", func(b *testing.B) {
benchmarkUnmarshal(b, _smallNoCommentRunes, SmallNoCommentRunes{})
})
})
b.Run("Medium", func(b *testing.B) {
b.Run("UnCommented", func(b *testing.B) {
benchmarkUnmarshal(b, _mediumUncommented, Medium{})
})
b.Run("NoCommentRunes", func(b *testing.B) {
benchmarkUnmarshal(b, _mediumNoCommentRunes, MediumNoCommentRunes{})
})
})
}

func benchmarkUnmarshal[T DataType](b *testing.B, data []byte, dt T) {
b.Helper()
b.RunParallel(func(p *testing.PB) {
for p.Next() {
UnmarshalOK(b, data, dt)
}
})
}

func UnmarshalOK[T DataType](t require.TestingT, data []byte, dt T) {
j := dt
assert.NoError(t, json.Unmarshal(data, &j), "unmarshal failed")
FieldsValue(t, j)
}
55 changes: 55 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Benchmark results

The tables below show the performance of [`Unmarshal`](#unmarshal---parse-json-with-comments-into-a-go-value) compared to the standard library's `encoding/json` and other alternative libraries on small and medium data sets.

They are formatted as follows:

| Data set | s/op | B/op | allocs/op |
| ------------- | ------------------------------------------- | ---- | --------- |
| Set reference | result (Δ% on reference / reference result) | same | same |

See the files in this directory for the full report.

### Standard library

The tables below show the performance of [`Unmarshal`](#unmarshal---parse-json-with-comments-into-a-go-value) compared to the standard library's `encoding/json` on small and medium data sets.

| **Small data set** | s/op | B/op | allocs/op |
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ---------------------- |
| [With comments](../testdata/small.json) | 2.536µ | 1.344Ki | 22.00 |
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 2.425µ (+27.17% / 1.907µ) | 1.219Ki (+14.71% / 1.062Ki) | 22.00 (+4.76% / 21.00) |
| [Without comment characters](../testdata/small_no_comment_runes.json) | 2.306µ (+16.11% / 1.986µ) | 1.062Ki (~% / 1.062Ki) | 21.00 (~% / 21.00) |

| **Medium data set** | s/op | B/op | allocs/op |
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ------------------------ |
| [With comments](../testdata/small.json) | 301.2µ | 324.7Ki | 1.067k |
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 202.3µ (+30.86% / 154.6µ) | 148.7Ki (+60.41% / 92.70Ki) | 1.067k (+0.09% / 1.066k) |
| [Without comment characters](../testdata/small_no_comment_runes.json) | 170.6µ (+11.63% / 152.8µ) | 92.70Ki (~% / 92.70Ki) | 1.066k (~% / 1.066k) |

### With [`github.com/json-iterator/go`](https://github.com/json-iterator/go)

| **Small data set** | s/op | B/op | allocs/op |
| -------------------------------------------------------------------------------------- | ------------------------- | ----------------------- | ---------------------- |
| [With comments](../testdata/small.json) | 1.632µ | 944.0 | 14.00 |
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 1.702µ (+11.94% / 1.521µ) | 816.0 (+24.39% / 656.0) | 14.00 (+7.69% / 13.00) |
| [Without comment characters](../testdata/small_no_comment_runes.json) | 1.603µ (~% / 1.598µ) | 656.0 (~% / 656.0) | 12.00 (~% / 13.00) |

| **Medium data set** | s/op | B/op | allocs/op |
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ------------------------ |
| [With comments](../testdata/small.json) | 245.0µ | 407.8Ki | 3.484k |
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 142.4µ (+42.25% / 100.1µ) | 231.8Ki (+31.90% / 175.7Ki) | 3.484k (+0.06% / 3.482k) |
| [Without comment characters](../testdata/small_no_comment_runes.json) | 113.1µ (+17.45% / 96.32µ) | 175.7Ki (+0.01% / 175.7Ki) | 3.482k (~% / 3.482k) |

### [`github.com/goccy/go-json`](https://github.com/goccy/go-json)

| **Small data set** | s/op | B/op | allocs/op |
| -------------------------------------------------------------------------------------- | ------------------------- | ----------------------- | ----------------------- |
| [With comments](../testdata/small.json) | 1.794µ | 1.047Ki | 10.00 |
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 1.797µ (+15.38% / 1.557µ) | 928.0 (+20.83% / 768.0) | 10.00 (+11.11% / 9.000) |
| [Without comment characters](../testdata/small_no_comment_runes.json) | 1.705µ (+3.30% / 1.651µ) | 768.0 (~% / 768.0) | 9.00 (~% / 9.000) |

| **Medium data set** | s/op | B/op | allocs/op |
| -------------------------------------------------------------------------------------- | ------------------------- | --------------------------- | ---------------------- |
| [With comments](../testdata/small.json) | 213.1µ | 434.9Ki | 77.00 |
| [Without comments](../testdata/small_uncommented.json) (comment characters in strings) | 101.4µ (+83.61% / 55.24µ) | 250.4Ki (+28.94% / 194.2Ki) | 73.00 (+2.82% / 71.00) |
| [Without comment characters](../testdata/small_no_comment_runes.json) | 72.60µ (+37.97% / 52.62µ) | 194.2Ki (+0.02% / 194.1Ki) | 71.00 (~% / 71.00) |
Loading

0 comments on commit 0f0d9d7

Please sign in to comment.