-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tiny Weights #14402
Tiny Weights #14402
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Hello! 👋 This Pull Request is now handled by arewefastyet. The current HEAD and future commits will be benchmarked. You can find the performance comparison on the arewefastyet website. |
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
cmp, err := evalengine.NullsafeCompare(currentKey[gb.KeyCol], nextRow[gb.KeyCol], gb.Type.Coll) | ||
v1 := currentKey[gb.KeyCol] | ||
v2 := nextRow[gb.KeyCol] | ||
if v1.TinyWeightCmp(v2) != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to do this in the evalengine.NullsafeCompare
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to wire up the comparison on the relevant callsites because there's a lot of places that don't use tiny weights right now. I think it'll be sensible to move it into NullsafeCompare
once we wire up tiny weight generation in more paths.
Description
This is a new optimization pattern for the execution engine. The idea is to speed up comparison operations by embedding "tiny weights" inside
sqltypes.Value
during execution. A tiny weight is a 4-byte compressed form of the full weight string for the value (see:evalengine.TinyWeighter
for a detailed description and implementation). Since we actually have 4 spare bytes insidesqltypes.Value
, we can inject the weight there without increasing the allocation cost for our in-memory rows, and any further comparison operators will automatically make use of them. This makes e.g. sorting wildly more efficient, because most comparisons during the sort can be performed by comparing twouint32
integers, instead of doing a full collation-aware comparison.Of course, two tiny weight strings can collide (as they're essentially a lossy form of the weight string), but this is perfectly safe because we always fall back to a full comparison of the two values whenever their tiny weight strings are identical.
The
arewefastyet
benchmark results are not wildly impressive because OLTP is actually a pathological case for this example. The strings that OLTP uses in sort queries in the benchmark are all numerical strings, so their alphabet is very reduced (10 possible characters), making the 4 byte string collide quite often. The improvement on these OLTPDistinct
-sorted queries is just 15%.If we were to craft a different benchmark with string columns that contain arbitrary UTF8 data, the improvement gets all the way to ~40% because all the comparisons during sorting are performed with the tiny weight strings.
The global improvement in
arewefast
is pretty good, particularly for latency:https://benchmark.vitess.io/compare?ltag=369b6a1e55aecd98c3cf6d4366cfbcee0477c474&rtag=946eb31e74187866a4a7414ca5df1954435681da
Again I wouldn't pay much attention to
OLTP
here because it's not representative of real world data (:cry:), but the speed up for real queries that includeSORT BY
orDISTINCT
will be significant.cc @dbussink @systay
Related Issue(s)
Checklist
Deployment Notes