Performance: store path in stack to avoid copying context #438

yoelsusanto · 2024-03-24T16:43:33Z

Fixes #325

This PR is created to dramatically improve performance for go-yaml library especially when parsing larger YAML files.

This is the methodology that I used to arrive at the current solution:

I use go test tool to create a CPU profile when doing parser.ParseBytes.
When I analyzed the CPUProfile, I found that withChild and withIndex are the two functions that dominated the CPU utilization.
Reading the source code, I found that both functions call parser.copy. This operation is performed repeatedly when parsing the YAML tokens.
Taking a deeper look, the essence of this copy operation is to pass the correct node path when parsing child node.
We can achieve the same result by managing a stack. We can push when parsing child node and pop after we finish.
This way, we improve performance by avoiding excessive memory allocation caused by copying the context.tokens.

I realized the implementation can be refactored further to adjust the way of using context, but I aim for minimum change in this PR. Please let me know if you have any suggestions 🙏 . Thank you!

Checklist:

Describe the purpose for which you created this PR.
Create test code that corresponds to the modification

codecov-commenter · 2024-03-24T16:45:08Z

Codecov Report

Merging #438 (2018c1a) into master (4653a1b) will decrease coverage by 0.09%.
The diff coverage is 96.96%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #438      +/-   ##
==========================================
- Coverage   76.02%   75.94%   -0.09%     
==========================================
  Files          13       13              
  Lines        4692     4701       +9     
==========================================
+ Hits         3567     3570       +3     
- Misses        866      870       +4     
- Partials      259      261       +2

yoelsusanto · 2024-03-27T09:42:04Z

Benchmark

package main

import (
	"bytes"
	"fmt"
	"testing"

	"github.com/goccy/go-yaml"
	goyaml2 "gopkg.in/yaml.v2"
	goyaml3 "gopkg.in/yaml.v3"
)

func prepareSampleData(line int) []byte {
	var data bytes.Buffer

	for i := 0; i < line/5; i++ {
		data.WriteString(`- propertyA: valueA
  propertyB: 2
  propertyC: test
  propertyD: example
  propertyE: example2
`)
	}

	return data.Bytes()
}

type sampleData struct {
	PropertyA string `yaml:"propertyA"`
	PropertyB string `yaml:"propertyB"`
	PropertyC string `yaml:"propertyC"`
	PropertyD string `yaml:"propertyD"`
	PropertyE string `yaml:"propertyE"`
}

func BenchmarkGoccyYAML(b *testing.B) {
	lines := []int{100, 1000, 10000}

	for _, line := range lines {
		sample := prepareSampleData(line)

		b.Run(fmt.Sprintf("%d lines", line), func(b *testing.B) {
			b.Run("github.com/goccy/go-yaml", func(b *testing.B) {
				var datas []sampleData

				for i := 0; i < b.N; i++ {
					err := yaml.Unmarshal(sample, &datas)
					if err != nil {
						b.Fatal(err)
					}
				}
			})

			b.Run("gopkg.in/yaml.v2", func(b *testing.B) {
				var datas []sampleData

				for i := 0; i < b.N; i++ {
					err := goyaml2.Unmarshal(sample, &datas)
					if err != nil {
						b.Fatal(err)
					}
				}
			})

			b.Run("gopkg.in/yaml.v3", func(b *testing.B) {
				var datas []sampleData

				for i := 0; i < b.N; i++ {
					err := goyaml3.Unmarshal(sample, &datas)
					if err != nil {
						b.Fatal(err)
					}
				}
			})
		})
	}
}

To compare before and after the fix, I have to use replace github.com/goccy/go-yaml => github.com/yoelsusanto/go-yaml v0.0.0-20240324162521-2018c1ab915b directive.

Result:

goos: darwin
goarch: arm64
pkg: github.com/yoelsusanto/benchmark-go-yaml
                                                  │ original.txt  │             fixed.txt              │
                                                  │    sec/op     │   sec/op     vs base               │
GoccyYAML/100_lines/github.com/goccy/go-yaml-12       411.7µ ± 1%   215.4µ ± 7%  -47.68% (p=0.002 n=6)
GoccyYAML/100_lines/gopkg.in/yaml.v2-12               69.28µ ± 0%   69.21µ ± 1%        ~ (p=0.240 n=6)
GoccyYAML/100_lines/gopkg.in/yaml.v3-12               91.12µ ± 0%   92.08µ ± 1%   +1.05% (p=0.004 n=6)
GoccyYAML/1000_lines/github.com/goccy/go-yaml-12     29.976m ± 3%   2.328m ± 1%  -92.23% (p=0.002 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v2-12              697.8µ ± 0%   691.1µ ± 0%   -0.95% (p=0.002 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v3-12              910.7µ ± 1%   919.5µ ± 1%        ~ (p=0.065 n=6)
GoccyYAML/10000_lines/github.com/goccy/go-yaml-12   3541.11m ± 1%   24.41m ± 1%  -99.31% (p=0.002 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v2-12             7.591m ± 1%   7.596m ± 1%        ~ (p=0.937 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v3-12             9.754m ± 1%   9.882m ± 1%   +1.31% (p=0.002 n=6)
geomean                                               2.864m        1.157m       -59.60%

                                                  │  original.txt   │               fixed.txt               │
                                                  │      B/op       │     B/op      vs base                 │
GoccyYAML/100_lines/github.com/goccy/go-yaml-12       1585.8Ki ± 0%   187.2Ki ± 0%  -88.19% (p=0.002 n=6)
GoccyYAML/100_lines/gopkg.in/yaml.v2-12                49.48Ki ± 0%   49.48Ki ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/100_lines/gopkg.in/yaml.v3-12                64.36Ki ± 0%   64.36Ki ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/1000_lines/github.com/goccy/go-yaml-12     137.455Mi ± 0%   1.828Mi ± 0%  -98.67% (p=0.002 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v2-12               449.7Ki ± 0%   449.7Ki ± 0%        ~ (p=0.364 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v3-12               585.1Ki ± 0%   585.1Ki ± 0%        ~ (p=0.545 n=6)
GoccyYAML/10000_lines/github.com/goccy/go-yaml-12   13025.59Mi ± 0%   19.00Mi ± 0%  -99.85% (p=0.002 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v2-12              4.369Mi ± 0%   4.369Mi ± 0%        ~ (p=0.069 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v3-12              5.677Mi ± 0%   5.677Mi ± 0%        ~ (p=0.498 n=6)
geomean                                                3.345Mi        809.1Ki       -76.38%
¹ all samples are equal

                                                  │ original.txt │              fixed.txt               │
                                                  │  allocs/op   │  allocs/op   vs base                 │
GoccyYAML/100_lines/github.com/goccy/go-yaml-12      6.381k ± 0%   5.338k ± 0%  -16.35% (p=0.002 n=6)
GoccyYAML/100_lines/gopkg.in/yaml.v2-12              1.037k ± 0%   1.037k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/100_lines/gopkg.in/yaml.v3-12              1.258k ± 0%   1.258k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/1000_lines/github.com/goccy/go-yaml-12     63.73k ± 0%   52.88k ± 0%  -17.02% (p=0.002 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v2-12             10.04k ± 0%   10.04k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/1000_lines/gopkg.in/yaml.v3-12             12.24k ± 0%   12.24k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/10000_lines/github.com/goccy/go-yaml-12    649.2k ± 0%   529.9k ± 0%  -18.38% (p=0.002 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v2-12            100.0k ± 0%   100.0k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/10000_lines/gopkg.in/yaml.v3-12            122.0k ± 0%   122.0k ± 0%        ~ (p=1.000 n=6) ¹
geomean                                              20.02k        18.80k        -6.12%
¹ all samples are equal

From the results, we can see:

Lower CPU usage
Lower number of allocations
Lower number of bytes per operation

Let me know if you need anything else 🙏

mcdee · 2024-03-27T14:00:09Z

This looks like a good change to me. Comparing with 1.11.3 using a complex ~6MB sample:

goos: linux
goarch: amd64
pkg: github.com/wealdtech/yamltest
cpu: 12th Gen Intel(R) Core(TM) i7-1265U
                 │     1.11.3     │              proposed              │
                 │     sec/op     │   sec/op     vs base               │
Decode/Decode-12   174099.6m ± 1%   798.8m ± 2%  -99.54% (p=0.002 n=6)

                 │     1.11.3      │              proposed               │
                 │      B/op       │     B/op      vs base               │
Decode/Decode-12   257761.9Mi ± 0%   494.6Mi ± 0%  -99.81% (p=0.002 n=6)

                 │   1.11.3    │             proposed              │
                 │  allocs/op  │  allocs/op   vs base              │
Decode/Decode-12   4.989M ± 0%   4.590M ± 0%  -8.00% (p=0.002 n=6)

And it looks okay compared to the last good version, 1.9.2:

goos: linux
goarch: amd64
pkg: github.com/wealdtech/yamltest
cpu: 12th Gen Intel(R) Core(TM) i7-1265U
                 │    1.9.2    │             proposed              │
                 │   sec/op    │   sec/op     vs base              │
Decode/Decode-12   750.5m ± 1%   798.8m ± 2%  +6.43% (p=0.002 n=6)

                 │    1.9.2     │              proposed              │
                 │     B/op     │     B/op      vs base              │
Decode/Decode-12   486.8Mi ± 0%   494.6Mi ± 0%  +1.60% (p=0.002 n=6)

                 │    1.9.2    │             proposed              │
                 │  allocs/op  │  allocs/op   vs base              │
Decode/Decode-12   4.181M ± 0%   4.590M ± 0%  +9.77% (p=0.002 n=6)

fzipi · 2024-04-07T20:07:39Z

Looks good to us also. See benchmarks in the related issue.

yoelsusanto · 2024-04-11T02:12:51Z

@goccy would you mind to have a look on the PR? 🙏 Thank you!

Update our fork with the performance patch from goccy/go-yaml#438. ``` goos: linux goarch: amd64 pkg: github.com/cerbos/cerbos/internal/parser cpu: 13th Gen Intel(R) Core(TM) i7-1360P │ parser_before.txt │ parser_after.txt │ │ sec/op │ sec/op vs base │ Unmarshal/10-16 3399.0µ ± 3% 847.2µ ± 4% -75.08% (p=0.000 n=10) Unmarshal/50-16 50.093m ± 7% 3.847m ± 2% -92.32% (p=0.000 n=10) Unmarshal/100-16 191.559m ± 6% 7.504m ± 3% -96.08% (p=0.000 n=10) Unmarshal/500-16 4674.56m ± 3% 35.09m ± 4% -99.25% (p=0.000 n=10) geomean 111.1m 5.412m -95.13% │ parser_before.txt │ parser_after.txt │ │ B/s │ B/s vs base │ Unmarshal/10-16 1.030Mi ± 3% 4.134Mi ± 4% +301.39% (p=0.000 n=10) Unmarshal/50-16 341.8Ki ± 6% 4423.8Ki ± 2% +1194.29% (p=0.000 n=10) Unmarshal/100-16 175.8Ki ± 6% 4506.8Ki ± 3% +2463.89% (p=0.000 n=10) Unmarshal/500-16 39.06Ki ± 0% 4794.92Ki ± 4% +12175.00% (p=0.000 n=10) geomean 223.1Ki 4.380Mi +1910.85% │ parser_before.txt │ parser_after.txt │ │ B/op │ B/op vs base │ Unmarshal/10-16 4258.8Ki ± 0% 384.6Ki ± 0% -90.97% (p=0.000 n=10) Unmarshal/50-16 91.248Mi ± 0% 1.813Mi ± 0% -98.01% (p=0.000 n=10) Unmarshal/100-16 358.297Mi ± 0% 3.596Mi ± 0% -99.00% (p=0.000 n=10) Unmarshal/500-16 7944.51Mi ± 0% 18.98Mi ± 0% -99.76% (p=0.000 n=10) geomean 181.3Mi 2.611Mi -98.56% │ parser_before.txt │ parser_after.txt │ │ allocs/op │ allocs/op vs base │ Unmarshal/10-16 10.595k ± 0% 8.978k ± 0% -15.26% (p=0.000 n=10) Unmarshal/50-16 50.64k ± 0% 42.93k ± 0% -15.23% (p=0.000 n=10) Unmarshal/100-16 100.83k ± 0% 85.38k ± 0% -15.32% (p=0.000 n=10) Unmarshal/500-16 505.9k ± 0% 425.6k ± 0% -15.87% (p=0.000 n=10) geomean 72.33k 61.17k -15.42% ``` Signed-off-by: Charith Ellawala <charith@cerbos.dev>

Update our fork with the performance patch from goccy/go-yaml#438. ``` goos: linux goarch: amd64 pkg: github.com/cerbos/cerbos/internal/parser cpu: 13th Gen Intel(R) Core(TM) i7-1360P │ parser_before.txt │ parser_after.txt │ │ sec/op │ sec/op vs base │ Unmarshal/10-16 3399.0µ ± 3% 847.2µ ± 4% -75.08% (p=0.000 n=10) Unmarshal/50-16 50.093m ± 7% 3.847m ± 2% -92.32% (p=0.000 n=10) Unmarshal/100-16 191.559m ± 6% 7.504m ± 3% -96.08% (p=0.000 n=10) Unmarshal/500-16 4674.56m ± 3% 35.09m ± 4% -99.25% (p=0.000 n=10) geomean 111.1m 5.412m -95.13% │ parser_before.txt │ parser_after.txt │ │ B/s │ B/s vs base │ Unmarshal/10-16 1.030Mi ± 3% 4.134Mi ± 4% +301.39% (p=0.000 n=10) Unmarshal/50-16 341.8Ki ± 6% 4423.8Ki ± 2% +1194.29% (p=0.000 n=10) Unmarshal/100-16 175.8Ki ± 6% 4506.8Ki ± 3% +2463.89% (p=0.000 n=10) Unmarshal/500-16 39.06Ki ± 0% 4794.92Ki ± 4% +12175.00% (p=0.000 n=10) geomean 223.1Ki 4.380Mi +1910.85% │ parser_before.txt │ parser_after.txt │ │ B/op │ B/op vs base │ Unmarshal/10-16 4258.8Ki ± 0% 384.6Ki ± 0% -90.97% (p=0.000 n=10) Unmarshal/50-16 91.248Mi ± 0% 1.813Mi ± 0% -98.01% (p=0.000 n=10) Unmarshal/100-16 358.297Mi ± 0% 3.596Mi ± 0% -99.00% (p=0.000 n=10) Unmarshal/500-16 7944.51Mi ± 0% 18.98Mi ± 0% -99.76% (p=0.000 n=10) geomean 181.3Mi 2.611Mi -98.56% │ parser_before.txt │ parser_after.txt │ │ allocs/op │ allocs/op vs base │ Unmarshal/10-16 10.595k ± 0% 8.978k ± 0% -15.26% (p=0.000 n=10) Unmarshal/50-16 50.64k ± 0% 42.93k ± 0% -15.23% (p=0.000 n=10) Unmarshal/100-16 100.83k ± 0% 85.38k ± 0% -15.32% (p=0.000 n=10) Unmarshal/500-16 505.9k ± 0% 425.6k ± 0% -15.87% (p=0.000 n=10) geomean 72.33k 61.17k -15.42% ``` Signed-off-by: Charith Ellawala <charith@cerbos.dev> --------- Signed-off-by: Charith Ellawala <charith@cerbos.dev>

fzipi · 2024-09-22T16:36:04Z

@goccy can you please take a look at this PR?

goccy · 2024-10-31T09:19:29Z

@yoelsusanto Thank you for your contribution. Also, Sorry for the late reply.
I'd like to improve the process of pop from the stack, so I created a PR on my side #494

performance: store path in stack to avoid copying context

2018c1a

yoelsusanto mentioned this pull request Mar 27, 2024

There is a bad performance hit when upgrading to >= 1.9.2 #325

Closed

charithe mentioned this pull request May 2, 2024

enhancement: Apply perf patch to YAML parser cerbos/cerbos#2132

Merged

theseion mentioned this pull request Sep 22, 2024

\r processed as literal in double quoted strings coreruleset/go-ftw#359

Open

goccy mentioned this pull request Oct 31, 2024

Improve parser performance #494

Merged

goccy closed this Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: store path in stack to avoid copying context #438

Performance: store path in stack to avoid copying context #438

yoelsusanto commented Mar 24, 2024

codecov-commenter commented Mar 24, 2024

yoelsusanto commented Mar 27, 2024

mcdee commented Mar 27, 2024

fzipi commented Apr 7, 2024

yoelsusanto commented Apr 11, 2024

fzipi commented Sep 22, 2024

goccy commented Oct 31, 2024 •

edited

Loading

Performance: store path in stack to avoid copying context #438

Performance: store path in stack to avoid copying context #438

Conversation

yoelsusanto commented Mar 24, 2024

codecov-commenter commented Mar 24, 2024

Codecov Report

yoelsusanto commented Mar 27, 2024

Benchmark

mcdee commented Mar 27, 2024

fzipi commented Apr 7, 2024

yoelsusanto commented Apr 11, 2024

fzipi commented Sep 22, 2024

goccy commented Oct 31, 2024 • edited Loading

goccy commented Oct 31, 2024 •

edited

Loading