Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: lazy load regexes to save memory. #900

Closed
wants to merge 3 commits into from
Closed

Conversation

jcchavezs
Copy link
Member

This PR aims to lazyload regexes to make sure we don't load in memory regexes that are not used (e.g. beyond our paranoia level) and only load them if we need them.

Some numbers I got

main

total alloc: 2532805824
mallocs: 31321308
frees: 30807310
heap alloc: 47062520
heap_objects: 513998
stack_inuse: 753664
stack_sys: 753664
other_sys: 599720
num_gc: 120
pause_total_ns: 7347291

this branch

total alloc: 2573655400
mallocs: 31423240
frees: 30985895
heap alloc: 39720744
heap_objects: 437345
stack_inuse: 950272
stack_sys: 950272
other_sys: 945888
num_gc: 127
pause_total_ns: 7457540

Code was:

func TestFTWAllocations(t *testing.T) {
	var m1, m2 runtime.MemStats
	runtime.GC()
	runtime.ReadMemStats(&m1)
	TestFTW(t)
	runtime.ReadMemStats(&m2)

	f, err := os.Create("ftw.bench")
	if err != nil {
		t.Fatal(err)
	}

	f.WriteString(fmt.Sprintf("total alloc: %d\n", m2.TotalAlloc-m1.TotalAlloc))
	f.WriteString(fmt.Sprintf("mallocs: %d\n", m2.Mallocs-m1.Mallocs))
	f.WriteString(fmt.Sprintf("frees: %d\n", m2.Frees-m1.Frees))
	f.WriteString(fmt.Sprintf("heap alloc: %d\n", m2.HeapAlloc-m1.HeapAlloc))
	f.WriteString(fmt.Sprintf("heap_objects: %d\n", m2.HeapObjects-m1.HeapObjects))
	f.WriteString(fmt.Sprintf("stack_inuse: %d\n", m2.StackInuse-m1.StackInuse))
	f.WriteString(fmt.Sprintf("stack_sys: %d\n", m2.StackSys-m1.StackSys))
	f.WriteString(fmt.Sprintf("other_sys: %d\n", m2.OtherSys-m1.OtherSys))
	f.WriteString(fmt.Sprintf("num_gc: %d\n", m2.NumGC-m1.NumGC))
	f.WriteString(fmt.Sprintf("pause_total_ns: %d\n", m2.PauseTotalNs-m1.PauseTotalNs))
func TestFTWAllocations(t *testing.T) {
	var m1, m2 runtime.MemStats
	runtime.GC()
	runtime.ReadMemStats(&m1)
	TestFTW(t)
	runtime.ReadMemStats(&m2)

	f, err := os.Create("ftw.bench")
	if err != nil {
		t.Fatal(err)
	}

	f.WriteString(fmt.Sprintf("total alloc: %d\n", m2.TotalAlloc-m1.TotalAlloc))
	f.WriteString(fmt.Sprintf("mallocs: %d\n", m2.Mallocs-m1.Mallocs))
	f.WriteString(fmt.Sprintf("frees: %d\n", m2.Frees-m1.Frees))
	f.WriteString(fmt.Sprintf("heap alloc: %d\n", m2.HeapAlloc-m1.HeapAlloc))
	f.WriteString(fmt.Sprintf("heap_objects: %d\n", m2.HeapObjects-m1.HeapObjects))
	f.WriteString(fmt.Sprintf("stack_inuse: %d\n", m2.StackInuse-m1.StackInuse))
	f.WriteString(fmt.Sprintf("stack_sys: %d\n", m2.StackSys-m1.StackSys))
	f.WriteString(fmt.Sprintf("other_sys: %d\n", m2.OtherSys-m1.OtherSys))
	f.WriteString(fmt.Sprintf("num_gc: %d\n", m2.NumGC-m1.NumGC))
	f.WriteString(fmt.Sprintf("pause_total_ns: %d\n", m2.PauseTotalNs-m1.PauseTotalNs))
}

@jcchavezs jcchavezs marked this pull request as ready for review November 5, 2023 17:50
@jcchavezs jcchavezs requested a review from a team as a code owner November 5, 2023 17:50
@jcchavezs
Copy link
Member Author

I believe @anuraaga can help to get much more interesting numbers.

Copy link

codecov bot commented Nov 5, 2023

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (9104d7c) 81.56% compared to head (22ba28d) 81.74%.
Report is 23 commits behind head on main.

Files Patch % Lines
experimental/regexp/regexp.go 0.00% 10 Missing ⚠️
internal/collections/named.go 0.00% 1 Missing ⚠️
internal/collections/sized.go 0.00% 1 Missing ⚠️
internal/operators/validate_nid.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #900      +/-   ##
==========================================
+ Coverage   81.56%   81.74%   +0.18%     
==========================================
  Files         160      162       +2     
  Lines        9051     9082      +31     
==========================================
+ Hits         7382     7424      +42     
+ Misses       1417     1409       -8     
+ Partials      252      249       -3     
Flag Coverage Δ
default 76.85% <56.66%> (+0.20%) ⬆️
examples 26.38% <20.00%> (+0.87%) ⬆️
ftw 46.69% <75.00%> (+0.02%) ⬆️
ftw-multiphase 48.87% <75.00%> (+0.09%) ⬆️
tinygo 75.07% <56.66%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look at the FTW benchmark numbers we have to see the effect on runtime. I guess it will be in the noise though, but there is overhead in sync.Once compared to not.

More importantly, this is causing regex compilation on the hot path of first request processing - given how expensive regex compilation is, I think this could basically "guarantee a failed first request" by missing a deadline or something. I suspect it's not worth it

@jcchavezs
Copy link
Member Author

jcchavezs commented Nov 14, 2023

@anuraaga did some changes to specifically skip compilation details in here. Instead allowed connectors to plug in their own regex compiler which I think is still beneficial for single thread environment (e.g. wasm).

If this sounds reasonable to you I wonder if instead of doing API changes here we should be applying this logic in https://github.com/corazawaf/coraza-wasilibs/blob/main/rx.go instead where most of the regexes are being managed. Something like

package wasmplugin

import (
	"regexp"

	experimental "github.com/corazawaf/coraza/v3/experimental/regexp"
	"github.com/corazawaf/coraza/v3/experimental/regexp/regexptypes"
)

func init() {
	experimental.SetRegexpCompiler(Compile)
}

type lazyRegexp struct {
	matchStringFn           func(string) bool
	findStringSubmatchFn    func(string) []string
	findAllStringSubmatchFn func(string, int) [][]string
	subexpNamesFn           func() []string
	matchFn                 func([]byte) bool
	stringFn                func() string
}

var _ regexptypes.Regexp = (*lazyRegexp)(nil)

func (r *lazyRegexp) MatchString(s string) bool {
	return r.matchStringFn(s)
}

func (r *lazyRegexp) FindStringSubmatch(s string) []string {
	return r.findStringSubmatchFn(s)
}

func (r *lazyRegexp) FindAllStringSubmatch(s string, n int) [][]string {
	return r.findAllStringSubmatchFn(s, n)
}

func (r *lazyRegexp) SubexpNames() []string {
	return r.subexpNamesFn()
}

func (r *lazyRegexp) Match(b []byte) bool {
	return r.matchFn(b)
}

func (r *lazyRegexp) String() string {
	return r.stringFn()
}

func Compile(expr string) (regexptypes.Regexp, error) {
	// Check if the expression is valid and can be compiled
	if _, err := regexp.Compile(expr); err != nil {
		return nil, err
	}

	lre := &lazyRegexp{}

	lre.matchStringFn = func(s string) bool {
		compileAndReassign(expr, lre)
		return lre.matchStringFn(s)
	}

	lre.findStringSubmatchFn = func(s string) []string {
		compileAndReassign(expr, lre)
		return lre.findStringSubmatchFn(s)
	}

	lre.findAllStringSubmatchFn = func(s string, n int) [][]string {
		compileAndReassign(expr, lre)
		return lre.findAllStringSubmatchFn(s, n)
	}

	lre.subexpNamesFn = func() []string {
		compileAndReassign(expr, lre)
		return lre.subexpNamesFn()
	}

	lre.matchFn = func(b []byte) bool {
		compileAndReassign(expr, lre)
		return lre.matchFn(b)
	}

	lre.stringFn = func() string {
		compileAndReassign(expr, lre)
		return lre.stringFn()
	}

	return lre, nil
}

func compileAndReassign(expr string, lre *lazyRegexp) {
	re := regexp.MustCompile(expr)

	lre.matchStringFn = re.MatchString
	lre.findStringSubmatchFn = re.FindStringSubmatch
	lre.findAllStringSubmatchFn = re.FindAllStringSubmatch
	lre.subexpNamesFn = re.SubexpNames
	lre.matchFn = re.Match
	lre.stringFn = re.String
}

would do the trick and avoid changes in here.

@jcchavezs jcchavezs closed this Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants