diff --git a/README.md b/README.md index b039086..c83fc3b 100644 --- a/README.md +++ b/README.md @@ -78,7 +78,9 @@ By default Spellbook prefers boxed slices (`Box<[T]>`) and boxed strs (`Box ##### Flag sets -Words in the dictionary are associated with any number of flags, like `adventure/DRSMZG` mentioned above. The order of the flags as written in the dictionary isn't important. We need a way to look up whether a flag exists in that set quickly. The right tool for the job might seem like a `HashSet` or a `BTreeSet`. Those are mutable though so they carry some extra overhead. A dictionary contains many many flag sets and the overhead adds up. So what we use instead is a sorted `Box<[Flag]>`. To look up a flag we use `slice::binary_search` which works as well as a `BTreeSet`. +Words in the dictionary are associated with any number of flags, like `adventure/DRSMZG` mentioned above. The order of the flags as written in the dictionary isn't important. We need a way to look up whether a flag exists in that set quickly. The right tool for the job might seem like a `HashSet` or a `BTreeSet`. Those are mutable though so they carry some extra overhead. A dictionary contains many many flag sets and the overhead adds up. So what we use instead is a sorted `Box<[Flag]>` and look up flags with `slice::binary_search`. + +Binary searching a small slice is typically a tiny bit slower than `slice::contains` but we prefer `slice::binary_search` for its consistent performance. See [`examples/bench-slice-contains.rs`](./examples/bench-slice-contains.rs) for more details. ##### Flags diff --git a/examples/bench-slice-contains.rs b/examples/bench-slice-contains.rs index 6278bb7..991a90e 100644 --- a/examples/bench-slice-contains.rs +++ b/examples/bench-slice-contains.rs @@ -1,129 +1,208 @@ -//! A benchmark for the different possible strategies of looking up a flag in a flagset. -//! -//! Originally I thought that binary search in a sorted flagset would clearly be better but it's -//! actually typically 1-2ns worse (24ns total) for these cases. Flagsets are probably always -//! small enough that binary search adds more overhead than it's worth. -//! -//! TODO: measure a histogram of flagset lengths in real `.dic` files. If binary search is causing -//! more harm than good, let's switch `FlagSet::contains` to use `slice::contains`. Maybe drop the -//! FlagSet wrapper struct completely and just use `Box<[Flag]>`. +/* +A benchmark for the different possible strategies of looking up a flag in a flagset. + +Originally I thought that binary search in a sorted flagset would clearly be better but it's +actually typically 1-2ns worse (24ns total) for common cases. When flagsets are small enough, +binary search adds more overhead than it's worth. + +I took a histogram of the length of flagsets used in LibreOffice/dictionaries: + +```text +# of samples: 10352117 +31.812710385711444'th percentile of data is 0 with 3293289 samples +68.81390540698101'th percentile of data is 1 with 3830407 samples +79.5990230790475'th percentile of data is 2 with 1116488 samples +86.29727619964109'th percentile of data is 3 with 693411 samples +90.34976130969153'th percentile of data is 4 with 419518 samples +93.03862195529669'th percentile of data is 5 with 278354 samples +95.01828466583213'th percentile of data is 6 with 204937 samples +95.84846268642443'th percentile of data is 7 with 85941 samples +96.32064629872325'th percentile of data is 8 with 48881 samples +96.89803544531036'th percentile of data is 9 with 59772 samples +97.34328736817794'th percentile of data is 10 with 46093 samples +97.78151657289035'th percentile of data is 11 with 45366 samples +98.05298761596299'th percentile of data is 12 with 28103 samples +98.37843795621707'th percentile of data is 13 with 33691 samples +98.85777952470977'th percentile of data is 14 with 49622 samples +99.00363374950264'th percentile of data is 15 with 15099 samples +99.37310407137014'th percentile of data is 16 with 38248 samples +99.53481012627658'th percentile of data is 17 with 16740 samples +99.67957278689953'th percentile of data is 18 with 14986 samples +99.71078379427126'th percentile of data is 19 with 3231 samples +99.73314636996471'th percentile of data is 20 with 2315 samples +99.79239995065744'th percentile of data is 21 with 6134 samples +99.81532279822571'th percentile of data is 22 with 2373 samples +99.833589593317'th percentile of data is 23 with 1891 samples +99.85790346071242'th percentile of data is 24 with 2517 samples +99.86442386615221'th percentile of data is 25 with 675 samples +99.9408623376262'th percentile of data is 26 with 7913 samples +99.94625253945642'th percentile of data is 27 with 558 samples +99.949546551686'th percentile of data is 28 with 341 samples +99.9525024688187'th percentile of data is 29 with 306 samples +99.95853022140302'th percentile of data is 30 with 624 samples +99.96063607086357'th percentile of data is 31 with 218 samples +99.96498300782342'th percentile of data is 33 with 450 samples +99.9685474961305'th percentile of data is 35 with 369 samples +99.97166763088168'th percentile of data is 37 with 323 samples +99.97396667754045'th percentile of data is 39 with 238 samples +99.97595660868207'th percentile of data is 41 with 206 samples +99.97818803632146'th percentile of data is 43 with 231 samples +99.97989783152566'th percentile of data is 45 with 177 samples +99.98148204855104'th percentile of data is 47 with 164 samples +99.98326912263454'th percentile of data is 49 with 185 samples +99.98466014246168'th percentile of data is 51 with 144 samples +99.98588694467036'th percentile of data is 53 with 127 samples +99.98694952926054'th percentile of data is 55 with 110 samples +99.98779959693267'th percentile of data is 57 with 88 samples +99.98877524278367'th percentile of data is 59 with 101 samples +99.9897895280743'th percentile of data is 61 with 105 samples +99.99072653448565'th percentile of data is 63 with 97 samples +99.99233007123084'th percentile of data is 67 with 166 samples +99.99370177133817'th percentile of data is 71 with 142 samples +99.994802995368'th percentile of data is 75 with 114 samples +99.9955178250014'th percentile of data is 79 with 74 samples +99.99609741659604'th percentile of data is 83 with 60 samples +99.9966866680506'th percentile of data is 87 with 61 samples +99.99709238216685'th percentile of data is 91 with 42 samples +99.99747877656328'th percentile of data is 95 with 40 samples +99.99779755194034'th percentile of data is 99 with 33 samples +99.9981163273174'th percentile of data is 103 with 33 samples +99.99845442241427'th percentile of data is 107 with 35 samples +99.99872489849177'th percentile of data is 111 with 28 samples +99.99893741540981'th percentile of data is 115 with 22 samples +99.99911129288822'th percentile of data is 119 with 18 samples +99.99918857176749'th percentile of data is 123 with 8 samples +99.99926585064678'th percentile of data is 127 with 8 samples +99.99943006826526'th percentile of data is 135 with 17 samples +99.99956530630402'th percentile of data is 143 with 14 samples +99.99966190490312'th percentile of data is 151 with 10 samples +99.99974884364232'th percentile of data is 159 with 9 samples +99.99983578238152'th percentile of data is 167 with 9 samples +99.99985510210135'th percentile of data is 175 with 2 samples +99.99988408168107'th percentile of data is 183 with 3 samples +99.99994204084054'th percentile of data is 191 with 6 samples +99.99996136056035'th percentile of data is 199 with 2 samples +99.99997102042026'th percentile of data is 207 with 1 samples +99.99998068028017'th percentile of data is 215 with 1 samples +99.99999034014009'th percentile of data is 231 with 1 samples +100'th percentile of data is 271 with 1 samples +``` + +Most words have exactly one flag. Any empty flagset is the second most popular. A quite vast +majority (90%) has four or fewer and we hit the 99th percentile with 15 flags in a flagset. This +breakdown also changes between dictionaries: en_US for example uses only small flagsets (around +ten at most). + +Given that `contains` is faster than `binary_search` for up to the 99th percentile it might seem +worthwhile to switch to `contains`. `binary_search` though has much more predictable performance +when we hit these outliers that live in the low hundreds of flags. + +```text +$ cargo run --release --example bench-slice-contains +Starting: Running benchmark(s). Stand by! + +••••••••••••••••• + +Method Mean Samples +----------------------------------------------------------------------------------------------- +lookup non-existing flag high in many flags (contains) 89.93 ns 4,810,921/5,000,000 +lookup non-existing flag high in many flags (binary_search) 25.79 ns 4,999,695/5,000,000 +----------------------------------------------------------------------------------------------- +lookup non-existing flag low in many flags (contains) 60.22 ns 4,997,489/5,000,000 +lookup non-existing flag low in many flags (binary_search) 24.97 ns 4,999,760/5,000,000 +----------------------------------------------------------------------------------------------- +lookup existing flag in many flags (contains) 50.24 ns 4,994,203/5,000,000 +lookup existing flag in many flags (binary_search) 24.84 ns 4,991,224/5,000,000 +----------------------------------------------------------------------------------------------- +lookup non-existing flag high in few flags (contains) 22.72 ns 4,999,801/5,000,000 +lookup non-existing flag high in few flags (binary_search) 23.66 ns 4,999,788/5,000,000 +----------------------------------------------------------------------------------------------- +lookup existing flag in few flags (contains) 22.71 ns 4,999,821/5,000,000 +lookup existing flag in few flags (binary_search) 23.16 ns 4,999,821/5,000,000 +----------------------------------------------------------------------------------------------- +lookup non-existing flag high in empty flags (contains) 22.49 ns 4,999,827/5,000,000 +lookup non-existing flag high in empty flags (binary_search) 22.95 ns 4,999,814/5,000,000 +``` + +I think the tradeoff is worthwhile: we pay around 1 extra nanosecond on average but have no +degenerate cases. + +*/ use brunch::Bench; use std::hint::black_box; type Flag = std::num::NonZeroU16; +const fn flag_n(n: u16) -> Flag { + assert!(n != 0); + + unsafe { Flag::new_unchecked(n) } +} + const fn flag(ch: char) -> Flag { assert!(ch as u32 != 0); unsafe { Flag::new_unchecked(ch as u16) } } -// en_US.dic `advise/LDRSZGB` -const MANY_FLAGS_UNSORTED: &[Flag] = &[ - flag('L'), - flag('D'), - flag('R'), - flag('S'), - flag('Z'), - flag('G'), - flag('B'), -]; -const MANY_FLAGS_SORTED: &[Flag] = &[ - flag('B'), - flag('D'), - flag('G'), - flag('L'), - flag('R'), - flag('S'), - flag('Z'), -]; - -// en_US.dic `advent/SM` -const FEW_FLAGS_UNSORTED: &[Flag] = &[flag('S'), flag('M')]; -const FEW_FLAGS_SORTED: &[Flag] = &[flag('M'), flag('S')]; +// be_BY.dic (Belarusian) `абвал/2,9,10,12,13,16,17,22,23,62,67,68,69,70,74,250,270,290,296,297,298,299,300,322,335,363,364,365,367,368,398,399,400,403,408,423,424,425,426,427,479,514,520,521,522,523,524,525,526,527,528,529,530,543,577,585,633,634,635,639,640,641,642,643,647,648,649,650,652,726,747,773,774,775,778,794,836,838,1076,1082,1087,1088,1089,1090,1091,1092,1093,1094,1095,1096,1097,1175,1276,1695,1696,1697,1704,1705,1706,1707,1708,1709,1710,1711,1902,1903,1904,1905,1906,1907,1908,1909,1910,1911,1912,1992,1993,2055,2056,2057,2058,2059,2060,2130,2429,2668,2875,2876,2877,2878,2879,3185,3186,3187,3188,3189,3190,3191,3192,3193,3194,3316,3317,3318,3600,3726,3949,4381,4382,4383,4384,4385,4386` +#[rustfmt::skip] +const MANY_FLAGS: &[Flag] = &[flag_n(2), flag_n(9), flag_n(10), flag_n(12), flag_n(13), flag_n(16), flag_n(17), flag_n(22), flag_n(23), flag_n(62), flag_n(67), flag_n(68), flag_n(69), flag_n(70), flag_n(74), flag_n(250), flag_n(270), flag_n(290), flag_n(296), flag_n(297), flag_n(298), flag_n(299), flag_n(300), flag_n(322), flag_n(335), flag_n(363), flag_n(364), flag_n(365), flag_n(367), flag_n(368), flag_n(398), flag_n(399), flag_n(400), flag_n(403), flag_n(408), flag_n(423), flag_n(424), flag_n(425), flag_n(426), flag_n(427), flag_n(479), flag_n(514), flag_n(520), flag_n(521), flag_n(522), flag_n(523), flag_n(524), flag_n(525), flag_n(526), flag_n(527), flag_n(528), flag_n(529), flag_n(530), flag_n(543), flag_n(577), flag_n(585), flag_n(633), flag_n(634), flag_n(635), flag_n(639), flag_n(640), flag_n(641), flag_n(642), flag_n(643), flag_n(647), flag_n(648), flag_n(649), flag_n(650), flag_n(652), flag_n(726), flag_n(747), flag_n(773), flag_n(774), flag_n(775), flag_n(778), flag_n(794), flag_n(836), flag_n(838), flag_n(1076), flag_n(1082), flag_n(1087), flag_n(1088), flag_n(1089), flag_n(1090), flag_n(1091), flag_n(1092), flag_n(1093), flag_n(1094), flag_n(1095), flag_n(1096), flag_n(1097), flag_n(1175), flag_n(1276), flag_n(1695), flag_n(1696), flag_n(1697), flag_n(1704), flag_n(1705), flag_n(1706), flag_n(1707), flag_n(1708), flag_n(1709), flag_n(1710), flag_n(1711), flag_n(1902), flag_n(1903), flag_n(1904), flag_n(1905), flag_n(1906), flag_n(1907), flag_n(1908), flag_n(1909), flag_n(1910), flag_n(1911), flag_n(1912), flag_n(1992), flag_n(1993), flag_n(2055), flag_n(2056), flag_n(2057), flag_n(2058), flag_n(2059), flag_n(2060), flag_n(2130), flag_n(2429), flag_n(2668), flag_n(2875), flag_n(2876), flag_n(2877), flag_n(2878), flag_n(2879), flag_n(3185), flag_n(3186), flag_n(3187), flag_n(3188), flag_n(3189), flag_n(3190), flag_n(3191), flag_n(3192), flag_n(3193), flag_n(3194), flag_n(3316), flag_n(3317), flag_n(3318), flag_n(3600), flag_n(3726), flag_n(3949), flag_n(4381), flag_n(4382), flag_n(4383), flag_n(4384), flag_n(4385), flag_n(4386)]; + +// en_US.dic `advent/SM` (sorted) +const FEW_FLAGS: &[Flag] = &[flag('M'), flag('S')]; const EMPTY_FLAGS: &[Flag] = &[]; -const UNKNOWN_FLAG_HIGH: Flag = flag('Z'); -const UNKNOWN_FLAG_LOW: Flag = flag('A'); +const UNKNOWN_FLAG_HIGH: Flag = flag_n(5000); +const UNKNOWN_FLAG_LOW: Flag = flag_n(1); -const D_FLAG: Flag = flag('D'); +const FLAG_S: Flag = flag('S'); +const FLAG_1709: Flag = flag_n(1709); const SAMPLES: u32 = 5_000_000; brunch::benches!( - Bench::new("contains: lookup non-existing high in many flags unsorted") + Bench::new("lookup non-existing flag high in many flags (contains)") .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_UNSORTED).contains(black_box(&UNKNOWN_FLAG_HIGH))), - Bench::new("contains: lookup non-existing high in many flags sorted") + .run(|| black_box(MANY_FLAGS).contains(&black_box(UNKNOWN_FLAG_HIGH))), + Bench::new("lookup non-existing flag high in many flags (binary_search)") .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_SORTED).contains(black_box(&UNKNOWN_FLAG_HIGH))), - Bench::new("contains: lookup non-existing low in many flags unsorted") - .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_UNSORTED).contains(black_box(&UNKNOWN_FLAG_LOW))), - Bench::new("contains: lookup non-existing low in many flags sorted") - .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_SORTED).contains(black_box(&UNKNOWN_FLAG_LOW))), - Bench::new("contains: lookup 'D' in many flags unsorted") - .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_UNSORTED).contains(black_box(&D_FLAG))), - Bench::new("contains: lookup 'D' in many flags sorted") - .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_SORTED).contains(black_box(&D_FLAG))), - // --- - Bench::new("contains: lookup non-existing high in few flags unsorted") - .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_UNSORTED).contains(&UNKNOWN_FLAG_HIGH)), - Bench::new("contains: lookup non-existing high in few flags sorted") - .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_SORTED).contains(&UNKNOWN_FLAG_HIGH)), - Bench::new("contains: lookup non-existing low in few flags unsorted") - .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_UNSORTED).contains(&UNKNOWN_FLAG_LOW)), - Bench::new("contains: lookup non-existing low in few flags sorted") - .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_SORTED).contains(&UNKNOWN_FLAG_LOW)), - Bench::new("contains: lookup 'D' in few flags unsorted") - .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_UNSORTED).contains(black_box(&D_FLAG))), - Bench::new("contains: lookup 'D' in few flags sorted") - .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_SORTED).contains(black_box(&D_FLAG))), - // --- - Bench::new("contains: lookup non-existing high in empty flags") + .run(|| black_box(MANY_FLAGS).binary_search(&black_box(UNKNOWN_FLAG_HIGH))), + Bench::spacer(), + Bench::new("lookup non-existing flag low in many flags (contains)") .with_samples(SAMPLES) - .run(|| black_box(EMPTY_FLAGS).contains(&UNKNOWN_FLAG_HIGH)), - Bench::new("contains: lookup non-existing low in empty flags") + .run(|| black_box(MANY_FLAGS).contains(&black_box(UNKNOWN_FLAG_LOW))), + Bench::new("lookup non-existing flag low in many flags (binary_search)") .with_samples(SAMPLES) - .run(|| black_box(EMPTY_FLAGS).contains(&UNKNOWN_FLAG_LOW)), - // ------ + .run(|| black_box(MANY_FLAGS).binary_search(&black_box(UNKNOWN_FLAG_LOW))), Bench::spacer(), - // ------ - Bench::new("binary_search: lookup non-existing high in many flags sorted") + Bench::new("lookup existing flag in many flags (contains)") .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_SORTED).binary_search(&UNKNOWN_FLAG_HIGH)), - Bench::new("binary_search: lookup non-existing low in many flags sorted") + .run(|| black_box(MANY_FLAGS).contains(&black_box(FLAG_1709))), + Bench::new("lookup existing flag in many flags (binary_search)") .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_SORTED).binary_search(&UNKNOWN_FLAG_LOW)), - Bench::new("binary_search: lookup 'D' in many flags sorted") + .run(|| black_box(MANY_FLAGS).binary_search(&black_box(FLAG_1709))), + Bench::spacer(), + Bench::new("lookup non-existing flag high in few flags (contains)") .with_samples(SAMPLES) - .run(|| black_box(MANY_FLAGS_SORTED).binary_search(black_box(&D_FLAG))), - // --- - Bench::new("binary_search: lookup non-existing high in few flags sorted") + .run(|| black_box(FEW_FLAGS).contains(&black_box(UNKNOWN_FLAG_HIGH))), + Bench::new("lookup non-existing flag high in few flags (binary_search)") .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_SORTED).binary_search(&UNKNOWN_FLAG_HIGH)), - Bench::new("binary_search: lookup non-existing low in few flags sorted") + .run(|| black_box(FEW_FLAGS).binary_search(&black_box(UNKNOWN_FLAG_HIGH))), + Bench::spacer(), + Bench::new("lookup existing flag in few flags (contains)") .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_SORTED).binary_search(&UNKNOWN_FLAG_LOW)), - Bench::new("binary_search: lookup 'D' in few flags sorted") + .run(|| black_box(FEW_FLAGS).contains(&black_box(FLAG_S))), + Bench::new("lookup existing flag in few flags (binary_search)") .with_samples(SAMPLES) - .run(|| black_box(FEW_FLAGS_SORTED).binary_search(black_box(&D_FLAG))), - // --- - Bench::new("binary_search: lookup non-existing high in empty flags") + .run(|| black_box(FEW_FLAGS).binary_search(&black_box(FLAG_S))), + Bench::spacer(), + Bench::new("lookup non-existing flag high in empty flags (contains)") .with_samples(SAMPLES) - .run(|| black_box(EMPTY_FLAGS).binary_search(&UNKNOWN_FLAG_HIGH)), - Bench::new("binary_search: lookup non-existing low in empty flags") + .run(|| black_box(EMPTY_FLAGS).contains(&black_box(UNKNOWN_FLAG_HIGH))), + Bench::new("lookup non-existing flag high in empty flags (binary_search)") .with_samples(SAMPLES) - .run(|| black_box(EMPTY_FLAGS).binary_search(&UNKNOWN_FLAG_LOW)), + .run(|| black_box(EMPTY_FLAGS).binary_search(&black_box(UNKNOWN_FLAG_HIGH))), );