Skip to content

Commit

Permalink
readme: Use 'G' suffix rather than 'D' for affixing example
Browse files Browse the repository at this point in the history
'R' and 'D' suffixes are basically the same thing in en_US. 'G' is a
very intuitive suffix too and adds a little variety.
  • Loading branch information
the-mikedavis committed Sep 10, 2024
1 parent 196e20e commit 9408ccd
Showing 1 changed file with 7 additions and 9 deletions.
16 changes: 7 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,23 +39,21 @@ For a more in depth overview, check out [`@zverok`]'s blog series [Rebuilding th

Hunspell dictionaries are split into two files: `<lang>.dic` and `<lang>.aff`.
The `.dic` file has a listing of stems and flags associated with that stem. For example `en_US.dic` contains the word `adventure/DRSMZG` meaning that "adventure" is a stem in the dictionary with flags `D`, `R`, `S`, `M`, `Z` and `G`.
The `.aff` file contains a bunch of rules to use when determining if a word is correct or figuring out which words to suggest. The most intuitive of these are prefixes and suffixes. `en_US` contains suffixes like `D` and `R`:
The `.aff` file contains a bunch of rules to use when determining if a word is correct or figuring out which words to suggest. The most intuitive of these are prefixes and suffixes. `en_US` contains suffixes like `R` and `G`:

```
SFX D Y 4
SFX D 0 d e
SFX D y ied [^aeiou]y
SFX D 0 ed [^ey]
SFX D 0 ed [aeiou]y
SFX R Y 4
SFX R 0 r e
SFX R y ier [^aeiou]y
SFX R 0 er [aeiou]y
SFX R 0 er [^ey]
SFX G Y 2
SFX G e ing e
SFX G 0 ing [^e]
```

Since "adventure" has these flags, these suffixes can be applied. The rules themselves are tables that define the flag (like `D`), what to strip from the end of the word (`0` for nothing), what to add to the end (`ied` for example) and under what condition the suffix applies (matches `[^aeiou]y` at the end for example). When checking a word like "adventured" you find any suffixes where the "add" portion of the suffix matches the ending of the word and check if the condition applies. The first clause of `D` applies since the "adventure" ends in 'e', and we add a 'd' to the end. When checking this happens in reverse. Starting with a word like "adventured" we strip the 'd' and check the condition. Similarly with `R`, the first clause matches because "adventure" ends with 'e' and we add an 'r', matching "adventurer".
Since "adventure" has these flags, these suffixes can be applied. The rules are structured as tables that define the flag (like `R`), what to strip from the end of the word (`0` for nothing), what to add to the end (`er` for example) and under what condition the suffix applies (matches `[^aeiou]y` meaning not 'a' 'e' 'i' 'o' 'u' and then 'y' for example). When checking a word like "adventurer" you find any suffixes where the "add" portion of the suffix matches the ending of the word and check if the condition applies. The first clause of `R` applies since the "adventure" ends in 'e', and we add a 'r' to the end. When checking this happens in reverse. Starting with a word like "adventurer" we strip the 'r' and check the condition. Similarly with `G`, the first clause matches "adventuring" because "adventure" ends with 'e' and we add an "ing".

Hunspell dictionaries use these prefixing and suffixing rules to compress the dictionary. Without prefixes and suffixes we'd need a big set of every possible conjugation of every word in the dictionary. That might be possible with the gigabytes of RAM we have today but it certainly isn't efficient.

Expand All @@ -70,7 +68,7 @@ COMPOUNDRULE n*1t
COMPOUNDRULE n*mp
```

`en_US.dic` has words for digits like `0/nm`, `0th/pt`, `1/n1`, `1st/p`, etc. The COMPOUNDRULEs directive describes a regex-like pattern using flags and `*` (zero-or-more) and `?` (zero-or-one) modifiers. For example the first compound rule in the table `n*1t` allows a word like "10th": it matches the `n` flag zero times and then "1" (the stem of the `1` flag in the `.dic` file) and "0th". The `n*` modifier at the front allows adding any number of any other digit, so this rule also allows words like "110th" or "100000000th".
`en_US.dic` has words for digits like `0/nm`, `0th/pt`, `1/n1`, `1st/p`, etc. The COMPOUNDRULE directive describes a regex-like pattern using flags and `*` (zero-or-more) and `?` (zero-or-one) modifiers. For example the first compound rule in the table `n*1t` allows a word like "10th": it matches the `n` flag zero times and then "1" (the stem of the `1` flag in the `.dic` file) and "0th". The `n*` modifier at the front allows adding any number of any other digit, so this rule also allows words like "110th" or "10000th".

### Other docs

Expand Down

0 comments on commit 9408ccd

Please sign in to comment.