-
Notifications
You must be signed in to change notification settings - Fork 3
Foldermessagecounts optimizations #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: tadzik/new-folder-count-api
Are you sure you want to change the base?
Foldermessagecounts optimizations #5
Conversation
|
I tried hacking examples/quest to make a benchmark, which seems to show both my suggested changes are clear wins: Source code at: https://github.com/ojwb/xapian/tree/benchmark-runbox (see https://github.com/ojwb/xapian/blob/benchmark-runbox/xapian-core/examples/quest.cc#L44 for the benchmarked code) The last two benchmarks use a cached database built by running xapian's test suite, but could easily be adapted to use a real runbox DB if you have one. (Also, note that |
| @@ -533,39 +533,27 @@ extern "C" { | |||
| queryparser.add_boolean_prefix("folder", "XFOLDER:"); | |||
| queryparser.add_boolean_prefix("flag", "XF"); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This object is no longer actually used!
8f308a1 to
f6b3361
Compare
Add a dedicated function for calculating folder counts
5e658e6 to
56d3c75
Compare
|
@ojwb try as I might, I could not replicate your results. I forcepushed to clean things up a bit, adding a benchmark script trying out all the different variants. The results look as follows: Running 100 iterations of sortedXapianQuery (baseline)
Running 100 iterations of sortedXapianQuery (baseline) It's possible that I just do something really stupid on the C++ side of things – so if you could take a look, I'd be very glad :) |
|
Nothing jumps out as wrong from a quick look, but your results really don't make sense. In particular the query parser does quite a lot of work and then builds the query by composing objects, so I don't see how it can really be much quicker than just composing the objects by hand. Passing I think runbox uses Xapian git master (because that's where the emscripten patches went) but what exact commit are you currently using? |
|
Oh, I see in RELEASE/1.4 and master departed ways back in 2016; some things get backported but the small patches for better emscripten support didn't, and that seems quite a significant step back in time. I think it'd make more sense to pick a known-good commit from the git master history to use. I rewrote the matcher between the two versions and the new version optimises better in many cases so that might explain the performance differences there, but your query parsing vs building timings still don't make any sense to me. |
|
I rebased my benchmark onto the HEAD of RELEASE/1.4 (which isn't very different to 1.4.16) and that also shows what I'd expect, though using check_at_least isn't as big a win as on master: (The numbers are different enough that these results are clearly repeatable despite the "WARNING" given.) I've added I've pushed this branch to https://github.com/ojwb/xapian/tree/benchmark-runbox-1.4 in case you want to look. |
|
It occurred to me that making the queries boolean would be faster, since otherwise the matcher has to calculate a weight for each document to find the highest achieved weight - it can't know that you aren't going to ask for it. This is for git master and shows that helps further: There's more than one way to do this - you can do it by scaling the query by a factor of zero (what I used in the benchmark): query *= 0.0;Or specify enquire.set_weighting_scheme(Xapian::BoolWeight()); |
|
BTW BM_check_at_least_1_hit being faster than BM_check_at_least in the last results is just random fluctuations - repeated runs show the opposite trend, which is what I'd expect. I probably should actually heed the warning and disable CPU frequency scaling while benchmarking... |
Initial measurements showed these to be slower, somewhat surprisingly. Further research is needed.