Create and check wordlists for use with diceware.
This is not a diceware implementation, but a helper to create and check appropriate wordlists.
Currently, we provide three scripts:
diceware-list
- create wordlists based on input lists. The lists created here have some cryptographically desirable features.
wlflakes
- checks existing wordlists for flaws like non-ASCII chars, too short words and more.
wldownload
- downloads Android wordlists
Why that? Creating wordlists for use with diceware (or other applications that rely on hard to predict collections of words) has some gotchas. Choosing the wrong words, diceware lists might become harder to use or generate weaker passphrases than they theoretically could.
For example
- words could contain chars that are not available on most (western) keyboards,
- words could be so short, that actually using them in a diceware passphrase would make brute force attacks easier than dictionary attacks,
- words could be so long, that people would have to type more without getting any security gain
- using words that are also part of the beginning (prefix) of another word in
the list (like:
air
,port
andairport
), reduces the number of possible different combinations of words and therefore decreases the difficulty to guess a respective passphrase.
There are more possible flaws, like rude language or words that sound similar which might make handling of the generated passphrases more difficult or at least unpleasant than necessary.
diceware-list tries to generate lists that avoid some of these flaws and provides tools to detect these flaws in already crafted wordlists.
Install latest release from pypi
(venv) $ pip install diceware-list
or clone repository from github:
$ git clone https://github.com/ulif/diceware-list.git
Please consider using virtualenv for deployment.
In an active virtualenv you can install an executable script of diceware-list running:
(venv) $ pip install . (venv) $ diceware-list --help usage: diceware-list [-h] [-l LENGTH] [-n] [--ascii] [-d SIDES] [-k] [-u] [--use-416] [--use-416] [-p {none,short,long}] [-v] [--version] DICTFILE [DICTFILE ...]
The diceware-list script creates new lists out of given ones:
$ diceware-list -n -l 7776 /usr/share/dict/words 11111 aaron 11112 abase ... 12353 as's 12354 asama 12355 ashe ... 66663 zuni 66664 éclat 66665 élan 66666 épée
The main target of diceware-list is to provide "good" wordlists. Wordlists are considered "good" if they
- contain enough terms for use with a certain diceware application (for instance 6^6 = 7776 terms if used with six six-sided dice)
- contain terms as short as possible (to reduce typing)
- contain terms as long as neccessary (to impede brute-force attacks)
- (optionally) contain no words with non-ASCII chars (to enable use with non-localized keyboards)
- (optionally) are a prefix code, i.e. no complete word in the list is prefix of another word in the list.
- contain no offending terms
The last topic is hard to solve technically (hints welcome!), but diceware-list can help to follow the other design rules.
The wordlists generated by diceware-list are not meant to be kept secret. You might put them on the internet, publish on facebook or print them in the New York Times. Instead the security of the diceware technique relies on the entropy or (in this case) "randomness" of your dice, computer, etc.
In other words: Your passphrases will not be safe because of hiding your wordlist. They will be safe because there are so many possible combinations of words you can pick from your wordlist. That means: longer lists are more secure than shorter ones (if really used to full extent by your source of randomness with diceware), but hidden lists are not more secure than public ones.
First, you need a file with words as "dictionary". On typical Debian
systems such files can be found in /usr/share/dicts/
.
This file can then be fed to diceware-list to create a wordlist suitable for use with diceware.:
$ diceware-list /usr/share/dict/words aaron abaci aback ... alan alana alar ... zulus zuni
By default all input words are filtered and output. Using the -l
option you
can request a certain length of the output wordlist. If an input list provides
more terms than needed, we will pick a subset. If there are not enough terms in
the input list, an error is raised.
With -n
you can tell diceware-list to put numbers into each line,
representing dice throws [1]
$ diceware-list -n -l 7776 /usr/share/dict/words 11111 abaci 11112 aback ... 11464 alan 11465 alana 11466 alar ... 66665 zulus 66666 zuni
If you create a wordlist for use with non-standard dice, for instance for
10-sided dice, then you can tell with -d
like this:
$ diceware-list -n -d 10 -l 10000 /usr/share/dict/words 1-1-1-1 aaron 1-1-1-2 abaci 1-1-1-3 aback ... 10-10-10-8 zoomed 10-10-10-9 zooms 10-10-10-10 zoos
The --ascii
option filters terms out, that contain non-ASCII
characters. This can help in generating non-english word lists that
are usable with regular english keyboards.
The verbose option --verbose
can be given multiple times to increase
verbosity.
See --help
for other options.
diceware-list follows loosely the recommendations given on http://diceware.com/ by Mr. Reinhold.
It differs in the following respects:
- it does not propose usage of very short terms.
- it does not encourage use of the diceware-kit, as this automatically decreases entropy of the result list: terms are too short and terms that are prefixes of other will be unavoidable.
Find flakes in wordlists.
$ wlflakes mywordlist.txt
No output means: no problems detected.
We can look for prefix flakes. I.e., we check, whether any line in the given file is the beginning of any other line.
$ cat wordlist.txt air port airport $ wlflakes wordlist.txt wordlist.txt:3: E1 "air" from line 1 is a prefix of "airport"
Double entries are also shown:
$ cat wordlist.txt air port air $ wlflakes wordlist.txt wordlist.txt:1: E1 "air" from line 1 is a prefix of "air" wordlist.txt:1: E2 "air" appears multiple times
More checks offered by wlflakes:
Warnings: - show terms containing non-ASCII chars - too short list entries (that are easer to bruteforce than to guess)
wlflakes supports also --help
or -h
to list all options supported.
Android wordlists are a nice source for wordlists. They can be downloaded from public repositories:
$ wldownload --raw -v Starting download of Android wordlist file. Fetching wordlist from https://android.googlesource.com/platform/pack... Done.
wldownload downloads these lists and helps to transform them into lists usable for diceware. Be aware, that terms from lists are output on stdout by default (and Android wordlists contain easily more than 100,000 terms):
$ wldownload > mylist $ cat mylist the to ... yt yuk
Terms are output on stdout by default (use shell redirects or --outfile
to
change that behaviour).
You can request non-english wordlists using --lang
or -l
with a
language code like cs
or de
. Use --lang-codes
to list all supported
language codes.
The --no-offensive
flag suppresses terms marked as possibly offensive.
In a clone of the sources you can run tests like this:
(venv) $ pip install -e ".[tests,dev]"
This command will download all required packages, especially py.test.
You can also install py.test manually with pip:
(venv)$ pip install pytest (venv)$ pip install -e .
and afterwards run tests like so:
(venv)$ pytest
If you also install tox:
(venv)$ pip install tox
then you can run all tests for all supported platforms at once:
(venv)$ tox
To get a coverage report, you can use the respective tox target:
(venv)$ tox -e clean,py39,result
Or you use the common coverage tool:
(venv)$ pip install coverage pytest-cov (venv)$ pytest --cov --cov-report= tests (venv)$ coverage report -m --include="diceware_list/*" (venv)$ coverage html
[1] | The wordlist length in this case should be
(number-of-sides-per-dice) powered to
(number-of-dicethrows) , for instance 6**5 = 7776 for five
six-sided dice or a single six-sided dice thrown five times. |