Skip to content

Commit

Permalink
compare doc: Copy edits
Browse files Browse the repository at this point in the history
  • Loading branch information
the-mikedavis committed Nov 13, 2024
1 parent ca13e83 commit 0c5b568
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions docs/compare.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,15 @@ Writes: 2,105,758 bytes

Mostly I am familiar with Nuspell so I'll be talking about Spellbook vs. Nuspell in this section.

The `check` code is basically a rewrite so they should perform very similarly. One major difference that might affect lookup time is the main lookup table. It's meant to be a hash multi-map, like a `HashMap<String, FlagSet>` but allowing duplicate keys. Nuspell rolls its own hash table type for this while Spellbook uses `hashbrown::HashTable` which is highly optimized. Spellbook also uses `ahash` by default which is quite fast while Nuspell uses `std::hash` (implementation-specific). This sometimes happens with Rust rewrites: it's a pain to take a dependency in C/C++ so C/C++ libraries/tools often leave performance on the table by not taking advantage of available high-performance dependencies. To confirm or deny this suspicion one could replace Nuspell's `Word_List` type with an adaptation from Google's `SwissTable` library (on which `hashbrown` is based).
The `check` code is basically a rewrite so they should perform very similarly. One major difference that might affect lookup time is the main lookup table. It's meant to be a hash multi-map, like a `HashMap<String, FlagSet>` but allowing duplicate keys. Nuspell rolls its own hash table type for this while Spellbook uses `hashbrown::HashTable` which has SIMD optimizations for searching. Spellbook also uses `ahash` by default which is quite fast while Nuspell uses `std::hash` (implementation-specific). This sometimes happens with Rust rewrites: it's a pain to take a dependency in C/C++ so C/C++ libraries/tools might leave performance on the table by not taking advantage of available high-performance dependencies. To confirm or deny this suspicion one could replace Nuspell's `Word_List` type with an adaptation from Google's `SwissTable` library (on which `hashbrown` is based).

Otherwise I suspect that Rust's standard library has better optimizations for string searching and equality, as I know it uses `memchr` and SIMD operations when available.

When it comes to memory, Spellbook is optimized to save memory by cutting out unnecessary bytes from the common string type used in the loop table, as well as small-string and small-slice optimizations for the stem and flagsets. The [internals] document has more details.
When it comes to memory, Spellbook is optimized to save memory by cutting out unnecessary bytes from the common string type used in the lookup table, as well as small-string and small-slice optimizations for the stem and flagsets. The [internals] document has more details.

## ZSpell

[`pluots/zspell`](https://github.com/pluots/zspell) is an interesting alternative to the Hunspell-like spellcheckers mentioned above. At time of writing ZSpell doesn't support suggestions. The interesting part of ZSpell is how it checks words instead.
[`pluots/zspell`](https://github.com/pluots/zspell) is an interesting alternative to the Hunspell-like spellcheckers mentioned above. ZSpell also takes the `.dic` and `.aff` Hunspell-style dictionary files. At time of writing ZSpell doesn't support suggestions. The interesting part of ZSpell is how it checks words instead.

ZSpell expands affixes during instantiation of a dictionary. (See the `README.md` doc in this repository for a basic intro on affixes.) The "classic" spellcheckers mentioned above contain a subset of the possible dictionary words in a main lookup table. For example Spellbook's table includes "adventure" but not some of its conjugations made possible by prefixes/suffixes like "adventurer" or "adventured". In contrast, ZSpell expands each stem so that its tables include "adventure", "adventures", "adventurer", "adventure", "adventuring" and more. When checking a word, ZSpell performs a lookup into (up to) a handful of hash maps.

Expand Down

0 comments on commit 0c5b568

Please sign in to comment.