Collection of normalized and easily installable hunspell dictionaries.
Useful with nodehun
, nspell
, and others.
See each of the below packages for install guidelines.
Note that normal, canonical, and preferred BCP-47 codes are used.
To illustrate, as American English and Brazilian Portuguese are the most common
types of English and Portuguese respectively, they get the codes en
and pt
.
Important: this project itself is MIT, but each index.dic
and index.aff
file still has its original license!
In total 91 dictionaries are provided.
Each dictionary can be installed on OS X by following this StackExchange answer.
I’ve only tested this on macOS, but there you at least need to install:
- wget:
brew install wget
(crawling) - hunspell:
brew install hunspell
(many dictionaries) - sed:
brew install gnu-sed
(crawling, many dictionaries) - coreutils:
brew install coreutils
(many dictionaries) - ispell:
brew install ispell
(German)
Note that sed and the GNU replacements should be setup in PATH to overwrite macOS defaults.
Dictionaries can be added if they:
- have a significant affix file (not just a
.dic
file) - have an open source license
- are convertible to UTF-8 with iconv(1)
The crawling and building is done in script/crawl.sh
.
Add code there, similar to the existing ones, to include new dictionaries.
See license
files in each dictionary for the licensing of index.dic
and
index.aff
files.