Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Hardcoded regular expressions are being compiled many times #4884

Open
Tracked by #4899
benjaminwinger opened this issue Feb 11, 2025 · 0 comments
Open
Tracked by #4899

Comments

@benjaminwinger
Copy link
Collaborator

Description

This query from ClickBench is heavily bottlenecked by our regex performance: MATCH (h:hits) WHERE h.Title =~ '.*Google.*' AND NOT h.URL =~ '.*\\.google\\..*' AND h.SearchPhrase <> '' RETURN h.SearchPhrase, MIN(h.URL), MIN(h.Title), COUNT(*) AS c, COUNT(DISTINCT h.UserID) ORDER BY c DESC LIMIT 10; (not that the original query uses a regex match, but we don't have a LIKE operator original sql query for reference).

See #4881 (comment): it takes 17s with the regex match compared to ~2.5s using contains instead.

Performance profiling shows that about 67% of the runtime is spent repeatedly compiling the regex (and a further 4% destroying the regex afterwards), despite it being the same each time.
Being able to compile the regex once and re-use it would speed things up significantly.
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant