-
Notifications
You must be signed in to change notification settings - Fork 86
Account for FILTER
s when considering greedy query planning
#1705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for FILTER
s when considering greedy query planning
#1705
Conversation
Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1705 +/- ##
=======================================
Coverage 89.86% 89.87%
=======================================
Files 389 389
Lines 37308 37329 +21
Branches 4204 4209 +5
=======================================
+ Hits 33527 33549 +22
+ Misses 2485 2484 -1
Partials 1296 1296 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great + nice trick with the filters. A further optimization would be to not have the dummy VALUES "plans" (one for each FILTER) have edges between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with Johannes: only add one dummy VALUES clause for each distinct set of variables in a FILTEr. Then we can still overestimate, but the cases where that happens will be rare, and the only bad outcome then is that we compute a greedy query plan in a case, where non-greedy would have worked as well.
…planning-budget. Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
Conformance check passed ✅No test result changes. |
|
FILTER
s when considering switching to greedy query planning
FILTER
s when considering switching to greedy query planningFILTER
s when considering greedy query planning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot + I wrote a proper description and will merge this now!
Since #1442, QLever switches to greedy query planning for large connected components. A connected component is considered large when the number of connected subgraphs is above the threshold determined by the runtime parameter
query-planning-budget
.So far,
FILTER
s were simply ignored when counting the number of subgraphs. However,FILTER
s can add significant complexity to the standard query planning because for each subplan, our query planner considers either adding all applicableFILTER
s to it or none of them. As a result, for certain queries with a medium-sized component but a significant number ofFILTER
s, the query planning complexity was underestimated and the query was not planned greedily and the standard query planning took very long.This is now fixed by replacing, for the purpose of query planning, each
FILTER
by a dummyVALUES
clause which uses the set of distinct variables from theFILTER
. AFILTER
that has many variables in common with other triples will then increase the subgraph count substantially. If multipleFILTER
s have the same set of distinct variables, the dummyVALUES
clause is added only once (because our query planner either adds all applicableFILTER
s at a certain point or none of them). Note that this trick overestimates the true query planning complexity. That is, the worst that can happen now is that with manyFILTER
s, we switch to greedy planning even though standard query planning would have still been feasible,