Foldermessagecounts optimizations #5

tadzik · 2020-08-07T14:58:40Z

Initial measurements showed these to be slower, somewhat surprisingly. Further research is needed.

ojwb · 2020-08-09T06:08:11Z

I tried hacking examples/quest to make a benchmark, which seems to show both my suggested changes are clear wins:

$ examples/quest 
2020-08-09 18:02:38
Running /home/olly/git/xapian/xapian-core/examples/.libs/quest
Run on (8 X 3900 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
Load Average: 6.41, 3.91, 2.90
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------
Benchmark                  Time             CPU   Iterations
------------------------------------------------------------
BM_query_parse          2617 ns         2616 ns       263335
BM_query_build           153 ns          151 ns      4377885
BM_full_mset           68338 ns        67761 ns         9750
BM_check_at_least      36032 ns        35822 ns        20589

Source code at: https://github.com/ojwb/xapian/tree/benchmark-runbox (see https://github.com/ojwb/xapian/blob/benchmark-runbox/xapian-core/examples/quest.cc#L44 for the benchmarked code)

The last two benchmarks use a cached database built by running xapian's test suite, but could easily be adapted to use a real runbox DB if you have one.

(Also, note that make will fail with a "help2man" error, but after it built example/quest successfully - sorry, this was just a quick hack...)

ojwb · 2020-08-09T06:09:25Z

rmmxapianapi.cc

@@ -533,39 +533,27 @@ extern "C" {
        queryparser.add_boolean_prefix("folder", "XFOLDER:");
        queryparser.add_boolean_prefix("flag", "XF");


This object is no longer actually used!

Add a dedicated function for calculating folder counts

tadzik · 2020-08-13T14:39:11Z

@ojwb try as I might, I could not replicate your results. I forcepushed to clean things up a bit, adding a benchmark script trying out all the different variants. The results look as follows:

Running 100 iterations of sortedXapianQuery (baseline)
Done in 2520ms (25.2 per iteration)
Running 100 iterations of getFolderMessageCounts
Done in 654ms (6.54 per iteration)
Running 100 iterations of getFolderMessageCounts_noFullSet
Done in 945ms (9.45 per iteration)
Running 100 iterations of getFolderMessageCounts_noQueryParser
Done in 3577ms (35.77 per iteration)
Running 100 iterations of getFolderMessageCounts_noQueryParser_noFullSet
Done in 2234ms (22.34 per iteration)

get_matches_estimated() seems to win a bit only when skipping the query parser, interestingly – but even then it just barely beats the performance of a running a sorted query and counting the results returned (while allocating them all).
The numbers above are for -Oz optimization level – -O3 (compiled to WASM) doesn't look that much different though:

Running 100 iterations of sortedXapianQuery (baseline)
Done in 1533ms (15.33 per iteration)
Running 100 iterations of getFolderMessageCounts
Done in 534ms (5.34 per iteration)
Running 100 iterations of getFolderMessageCounts_noFullSet
Done in 789ms (7.89 per iteration)
Running 100 iterations of getFolderMessageCounts_noQueryParser
Done in 2747ms (27.47 per iteration)
Running 100 iterations of getFolderMessageCounts_noQueryParser_noFullSet
Done in 1430ms (14.3 per iteration)

It's possible that I just do something really stupid on the C++ side of things – so if you could take a look, I'd be very glad :)

ojwb · 2020-08-13T22:02:58Z

Nothing jumps out as wrong from a quick look, but your results really don't make sense. In particular the query parser does quite a lot of work and then builds the query by composing objects, so I don't see how it can really be much quicker than just composing the objects by hand.

Passing FLAG_PARTIAL is probably unwise as you definitely don't want partial term expansion (but the query parser shouldn't attempt that for boolean filters). But if anything that would make the queryparser case slower not faster.

I think runbox uses Xapian git master (because that's where the emscripten patches went) but what exact commit are you currently using?

ojwb · 2020-08-20T05:23:20Z

Oh, I see in build-xapian.sh you seem to be using the v1.4.16 tag as of df2313e, but prior to that you were using git master.

RELEASE/1.4 and master departed ways back in 2016; some things get backported but the small patches for better emscripten support didn't, and that seems quite a significant step back in time. I think it'd make more sense to pick a known-good commit from the git master history to use.

I rewrote the matcher between the two versions and the new version optimises better in many cases so that might explain the performance differences there, but your query parsing vs building timings still don't make any sense to me.

ojwb · 2020-08-20T06:08:22Z

I rebased my benchmark onto the HEAD of RELEASE/1.4 (which isn't very different to 1.4.16) and that also shows what I'd expect, though using check_at_least isn't as big a win as on master:

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations
------------------------------------------------------------------
BM_query_parse                2627 ns         2627 ns       265806
BM_query_build                 134 ns          134 ns      5136154
BM_full_mset                 67363 ns        67361 ns        10329
BM_check_at_least            49127 ns        49126 ns        14312
BM_check_at_least_1_hit      65276 ns        65272 ns        10665

(The numbers are different enough that these results are clearly repeatable despite the "WARNING" given.)

I've added BM_check_at_least_1_hit which requests a single result rather than none, mostly because I was curious how that would compare.

I've pushed this branch to https://github.com/ojwb/xapian/tree/benchmark-runbox-1.4 in case you want to look.

ojwb · 2020-08-20T06:26:01Z

It occurred to me that making the queries boolean would be faster, since otherwise the matcher has to calculate a weight for each document to find the highest achieved weight - it can't know that you aren't going to ask for it.

This is for git master and shows that helps further:

-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
BM_query_parse                    34498 ns        33746 ns        21805
BM_query_build                     1808 ns         1760 ns       366834
BM_full_mset                     690123 ns       686099 ns         1079
BM_check_at_least                507740 ns       507670 ns         1068
BM_check_at_least_1_hit          468527 ns       468408 ns         1310
BM_full_mset_bool                372477 ns       372222 ns         1831
BM_check_at_least_bool           222924 ns       222876 ns         2781
BM_check_at_least_1_hit_bool     260981 ns       260882 ns         2813

There's more than one way to do this - you can do it by scaling the query by a factor of zero (what I used in the benchmark):

query *= 0.0;

Or specify BoolWeight as the weighting scheme:

enquire.set_weighting_scheme(Xapian::BoolWeight());

ojwb · 2020-08-20T06:31:57Z

BTW BM_check_at_least_1_hit being faster than BM_check_at_least in the last results is just random fluctuations - repeated runs show the opposite trend, which is what I'd expect. I probably should actually heed the warning and disable CPU frequency scaling while benchmarking...

tadzik mentioned this pull request Aug 7, 2020

Add a dedicated function for calculating folder counts #4

Merged

ojwb reviewed Aug 9, 2020

View reviewed changes

tadzik force-pushed the tadzik/new-folder-count-api branch from 8f308a1 to f6b3361 Compare August 12, 2020 16:18

tadzik and others added 2 commits August 12, 2020 18:20

Merge pull request #4 from runbox/tadzik/new-folder-count-api

83a0499

Add a dedicated function for calculating folder counts

wip benchmarking different getFolderMessageCounts variants

56d3c75

tadzik force-pushed the tadzik/foldermessagecounts-optimizations branch from 5e658e6 to 56d3c75 Compare August 13, 2020 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foldermessagecounts optimizations #5

Foldermessagecounts optimizations #5

Uh oh!

tadzik commented Aug 7, 2020

Uh oh!

ojwb commented Aug 9, 2020 •

edited

Loading

Uh oh!

ojwb Aug 9, 2020

Uh oh!

tadzik commented Aug 13, 2020

Uh oh!

ojwb commented Aug 13, 2020

Uh oh!

ojwb commented Aug 20, 2020

Uh oh!

ojwb commented Aug 20, 2020

Uh oh!

ojwb commented Aug 20, 2020

Uh oh!

ojwb commented Aug 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -533,39 +533,27 @@ extern "C" {
		queryparser.add_boolean_prefix("folder", "XFOLDER:");
		queryparser.add_boolean_prefix("flag", "XF");

Foldermessagecounts optimizations #5

Are you sure you want to change the base?

Foldermessagecounts optimizations #5

Uh oh!

Conversation

tadzik commented Aug 7, 2020

Uh oh!

ojwb commented Aug 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ojwb Aug 9, 2020

Choose a reason for hiding this comment

Uh oh!

tadzik commented Aug 13, 2020

Uh oh!

ojwb commented Aug 13, 2020

Uh oh!

ojwb commented Aug 20, 2020

Uh oh!

ojwb commented Aug 20, 2020

Uh oh!

ojwb commented Aug 20, 2020

Uh oh!

ojwb commented Aug 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ojwb commented Aug 9, 2020 •

edited

Loading