Skip to content

Conversation

@tadzik
Copy link
Contributor

@tadzik tadzik commented Aug 7, 2020

Initial measurements showed these to be slower, somewhat surprisingly. Further research is needed.

@ojwb
Copy link

ojwb commented Aug 9, 2020

I tried hacking examples/quest to make a benchmark, which seems to show both my suggested changes are clear wins:

$ examples/quest 
2020-08-09 18:02:38
Running /home/olly/git/xapian/xapian-core/examples/.libs/quest
Run on (8 X 3900 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 6.41, 3.91, 2.90
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_query_parse          2617 ns         2616 ns       263335
BM_query_build           153 ns          151 ns      4377885
BM_full_mset           68338 ns        67761 ns         9750
BM_check_at_least      36032 ns        35822 ns        20589

Source code at: https://github.com/ojwb/xapian/tree/benchmark-runbox (see https://github.com/ojwb/xapian/blob/benchmark-runbox/xapian-core/examples/quest.cc#L44 for the benchmarked code)

The last two benchmarks use a cached database built by running xapian's test suite, but could easily be adapted to use a real runbox DB if you have one.

(Also, note that make will fail with a "help2man" error, but after it built example/quest successfully - sorry, this was just a quick hack...)

@@ -533,39 +533,27 @@ extern "C" {
queryparser.add_boolean_prefix("folder", "XFOLDER:");
queryparser.add_boolean_prefix("flag", "XF");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This object is no longer actually used!

@tadzik tadzik force-pushed the tadzik/new-folder-count-api branch from 8f308a1 to f6b3361 Compare August 12, 2020 16:18
@tadzik tadzik force-pushed the tadzik/foldermessagecounts-optimizations branch from 5e658e6 to 56d3c75 Compare August 13, 2020 14:38
@tadzik
Copy link
Contributor Author

tadzik commented Aug 13, 2020

@ojwb try as I might, I could not replicate your results. I forcepushed to clean things up a bit, adding a benchmark script trying out all the different variants. The results look as follows:

Running 100 iterations of sortedXapianQuery (baseline)
Done in 2520ms (25.2 per iteration)
Running 100 iterations of getFolderMessageCounts
Done in 654ms (6.54 per iteration)
Running 100 iterations of getFolderMessageCounts_noFullSet
Done in 945ms (9.45 per iteration)
Running 100 iterations of getFolderMessageCounts_noQueryParser
Done in 3577ms (35.77 per iteration)
Running 100 iterations of getFolderMessageCounts_noQueryParser_noFullSet
Done in 2234ms (22.34 per iteration)

get_matches_estimated() seems to win a bit only when skipping the query parser, interestingly – but even then it just barely beats the performance of a running a sorted query and counting the results returned (while allocating them all).
The numbers above are for -Oz optimization level – -O3 (compiled to WASM) doesn't look that much different though:

Running 100 iterations of sortedXapianQuery (baseline)
Done in 1533ms (15.33 per iteration)
Running 100 iterations of getFolderMessageCounts
Done in 534ms (5.34 per iteration)
Running 100 iterations of getFolderMessageCounts_noFullSet
Done in 789ms (7.89 per iteration)
Running 100 iterations of getFolderMessageCounts_noQueryParser
Done in 2747ms (27.47 per iteration)
Running 100 iterations of getFolderMessageCounts_noQueryParser_noFullSet
Done in 1430ms (14.3 per iteration)

It's possible that I just do something really stupid on the C++ side of things – so if you could take a look, I'd be very glad :)

@ojwb
Copy link

ojwb commented Aug 13, 2020

Nothing jumps out as wrong from a quick look, but your results really don't make sense. In particular the query parser does quite a lot of work and then builds the query by composing objects, so I don't see how it can really be much quicker than just composing the objects by hand.

Passing FLAG_PARTIAL is probably unwise as you definitely don't want partial term expansion (but the query parser shouldn't attempt that for boolean filters). But if anything that would make the queryparser case slower not faster.

I think runbox uses Xapian git master (because that's where the emscripten patches went) but what exact commit are you currently using?

@ojwb
Copy link

ojwb commented Aug 20, 2020

Oh, I see in build-xapian.sh you seem to be using the v1.4.16 tag as of df2313e, but prior to that you were using git master.

RELEASE/1.4 and master departed ways back in 2016; some things get backported but the small patches for better emscripten support didn't, and that seems quite a significant step back in time. I think it'd make more sense to pick a known-good commit from the git master history to use.

I rewrote the matcher between the two versions and the new version optimises better in many cases so that might explain the performance differences there, but your query parsing vs building timings still don't make any sense to me.

@ojwb
Copy link

ojwb commented Aug 20, 2020

I rebased my benchmark onto the HEAD of RELEASE/1.4 (which isn't very different to 1.4.16) and that also shows what I'd expect, though using check_at_least isn't as big a win as on master:

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_query_parse                2627 ns         2627 ns       265806
BM_query_build                 134 ns          134 ns      5136154
BM_full_mset                 67363 ns        67361 ns        10329
BM_check_at_least            49127 ns        49126 ns        14312
BM_check_at_least_1_hit      65276 ns        65272 ns        10665

(The numbers are different enough that these results are clearly repeatable despite the "WARNING" given.)

I've added BM_check_at_least_1_hit which requests a single result rather than none, mostly because I was curious how that would compare.

I've pushed this branch to https://github.com/ojwb/xapian/tree/benchmark-runbox-1.4 in case you want to look.

@ojwb
Copy link

ojwb commented Aug 20, 2020

It occurred to me that making the queries boolean would be faster, since otherwise the matcher has to calculate a weight for each document to find the highest achieved weight - it can't know that you aren't going to ask for it.

This is for git master and shows that helps further:

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_query_parse                    34498 ns        33746 ns        21805
BM_query_build                     1808 ns         1760 ns       366834
BM_full_mset                     690123 ns       686099 ns         1079
BM_check_at_least                507740 ns       507670 ns         1068
BM_check_at_least_1_hit          468527 ns       468408 ns         1310
BM_full_mset_bool                372477 ns       372222 ns         1831
BM_check_at_least_bool           222924 ns       222876 ns         2781
BM_check_at_least_1_hit_bool     260981 ns       260882 ns         2813

There's more than one way to do this - you can do it by scaling the query by a factor of zero (what I used in the benchmark):

query *= 0.0;

Or specify BoolWeight as the weighting scheme:

enquire.set_weighting_scheme(Xapian::BoolWeight());

@ojwb
Copy link

ojwb commented Aug 20, 2020

BTW BM_check_at_least_1_hit being faster than BM_check_at_least in the last results is just random fluctuations - repeated runs show the opposite trend, which is what I'd expect. I probably should actually heed the warning and disable CPU frequency scaling while benchmarking...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants