Skip to content

Python-based implementation and comparison of strategies to guess words at Wordle

License

Notifications You must be signed in to change notification settings

ilibarra/wordle_solver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solver and comparison of strategies for Wordle

Motivation

The goal of this repository is to compare, in terms of performance, strategies that minimize the number of guesses needed to find a word match in Wordle. A script for general usage on is also available.

Introduction

While playing Wordle's word of the day, several strategies and dictionaries can be used to guide the selection of a best next guesses. To describe strategies, in the example below, the target word is INTER. In the current iteration and after an initial guess, we ended up with a bunch of possible options.

Clustering example

Among possible strategies, one could try:

  1. Letter frequency, position-independent.
  2. Letter frequency per position, position-specific.
  3. Letter frequency per position, plus letter co-variation between positions.
  4. Brute force mapping of best word matched similar to Tyler Gaiel's implementation (pending).
  5. Discarding words submitted at previous days (only via scripts)

I implemented this to test option 3. letter co-variation among positions. Visually, this can be described by checking the low overlap of E at columns 4 and 5. This is an indicator that those two letters are not co-occurring together, and might be used simultaneously to guide the selection of the best next guess. For more details, see mutual information.

Below, there is a simulation to test both 1, 2, and 3, with two public dictionaries. I am not sure which is the official dictionary that Wordle uses, but as the Linux one has more words I am using that one by default.

Results

Tests using all words from the dictionaries indicate that overall letter frequencies (wordfreq) are the most relevant criteria for best next guess selection (lowest mean guesses, ~3.68). Letter co-variation among positions, so far, is not conferring a positive advantage, and it seems to perform worse overall. This trend could change in case there is a bug in the code, or a better strategy changes based on the co-variation complexity of words in the dictionary.

“Benchmarking”

(blue line = median, red line = mean)

This is the same analysis, across dictionaries of length 3, 4 and 5. Overall, trends do not indicate that co-variation improves results. The strategy wordfreq seems to repeatedly word well.

“Benchmarking”

Next steps

  • Addition of best guesses based on brute force.

Usage

First, run the daily.py script without any input guesses (-g) and rules (-r). You will get the most likely guess, given the input strategy (--strategy) and dictionary (--d). If using the option --plot, heatmaps saved in out.

python daily.py -g '' -r '' -d american_5 --strategy posfreqcovar

Assuming as guess the word BRINY, then query that into Wordle. You will get rules based on matches to the word of the day, that you can use as input in the script (0 = no match, 1 = word match, 2 = position match). Additional, heatmaps with the visualization above will be saved in out, so you can visualize the current options.

python daily.py -g "BRINY" --rules "01000" -d american_5 --strategy posfreqcovar --plot

Assuming the next word is SERUM, one match in position one

python daily.py -g "BRINY,SERUM" --rules "01000,20100" -d american_5 --strategy posfreqcovar --plot

From here, you can continue until getting a solution (probably 1-2 more guesses, max.)

If you think there could be additional strategies to test, reach out! Have fun!

Troubleshooting: Please open an issue. License: GNU.

About

Python-based implementation and comparison of strategies to guess words at Wordle

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published