Skip to content

Account for FILTERs when considering greedy query planning #1705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 4, 2025

Conversation

joka921
Copy link
Member

@joka921 joka921 commented Jan 9, 2025

Since #1442, QLever switches to greedy query planning for large connected components. A connected component is considered large when the number of connected subgraphs is above the threshold determined by the runtime parameter query-planning-budget.

So far, FILTERs were simply ignored when counting the number of subgraphs. However, FILTERs can add significant complexity to the standard query planning because for each subplan, our query planner considers either adding all applicable FILTERs to it or none of them. As a result, for certain queries with a medium-sized component but a significant number of FILTERs, the query planning complexity was underestimated and the query was not planned greedily and the standard query planning took very long.

This is now fixed by replacing, for the purpose of query planning, each FILTER by a dummy VALUES clause which uses the set of distinct variables from the FILTER. A FILTER that has many variables in common with other triples will then increase the subgraph count substantially. If multiple FILTERs have the same set of distinct variables, the dummy VALUES clause is added only once (because our query planner either adds all applicable FILTERs at a certain point or none of them). Note that this trick overestimates the true query planning complexity. That is, the worst that can happen now is that with many FILTERs, we switch to greedy planning even though standard query planning would have still been feasible,

Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
Copy link

codecov bot commented Jan 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.87%. Comparing base (acb6633) to head (982cff7).
Report is 19 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1705   +/-   ##
=======================================
  Coverage   89.86%   89.87%           
=======================================
  Files         389      389           
  Lines       37308    37329   +21     
  Branches     4204     4209    +5     
=======================================
+ Hits        33527    33549   +22     
+ Misses       2485     2484    -1     
  Partials     1296     1296           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great + nice trick with the filters. A further optimization would be to not have the dummy VALUES "plans" (one for each FILTER) have edges between them.

Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with Johannes: only add one dummy VALUES clause for each distinct set of variables in a FILTEr. Then we can still overestimate, but the cases where that happens will be rare, and the only bad outcome then is that we compute a greedy query plan in a case, where non-greedy would have worked as well.

…planning-budget.

Signed-off-by: Johannes Kalmbach <johannes.kalmbach@gmail.com>
@sparql-conformance
Copy link

Copy link

@hannahbast hannahbast changed the title Also account for the filters when counting the subgraphs. Account for FILTERs when considering switching to greedy query planning Feb 4, 2025
@hannahbast hannahbast changed the title Account for FILTERs when considering switching to greedy query planning Account for FILTERs when considering greedy query planning Feb 4, 2025
Copy link
Member

@hannahbast hannahbast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot + I wrote a proper description and will merge this now!

@hannahbast hannahbast merged commit aa55057 into ad-freiburg:master Feb 4, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants