Skip to content

Automatic generation

Leo-Nicolle edited this page Jun 16, 2022 · 8 revisions

Goal

Being able to generate a complete, valid grid automatically. It should use a provided dictionary and create a dense grid. By dense I mean without any square that is not a definition, nor a letter from a word. Like this:

Screenshot from 2022-04-04 23-01-20

Approach

In this video, the guy explains how he is solving Wordle thanks to information theory. I wonder if I could adapt his approach to my problem.

Let's figure it out !

Overview of the problem:

Let's say you have this grid:

And you want to add the next word. You want to find a word that will let you the most possible next words. Something like this:

gif-1

Or like This:

gif-2

What we want when trying a candidate for the next word, is to know if it is good or not. Let's try to compute a score for a candidate word. 3Blue1Brown's idea is to compute the entropy for each word, and pick the word with the highest entropy. If we manage to measure the entropy for our candidates, ie measuring how many words could fit in the grid after this next word is placed, we just need to pick among the lowest entropy words.

Computing the entropy of a candidate

Let's say we are trying to compute the entropy for the word next in this configuration: entropy-1

For each letter of NEXT, there are few things that can happen. We could decide to end the vertical word right after, or make it start right before, or not end it yet: entropy-1

So, depending on the situation, we are either looking for a word ending by EN, or starting EN, or containing EN. I think we need a data structure storing for each pair of letter and for each index within a word, which words in the dictionary exist.

data-struct

Or even store the words by length. Like this

data-struc-2

Then, while computing the score of a candidate, like the word NEXT, I could iterate through all the letters and find how many words could fit in the vertical direction.

entropy-2

It seems like a good start. It does not seem as good though as 3Blue1Brown's distribution. He bakes everything beforehand for each possible, and then can use it fast. If I want to get a similar chart with the entropy of each possible pattern he could encounter along the game. Which is already huge. Here we need to compute something not exhaustive, less accurate, but still good enough. But what ?

Second thought about how to add a new word

Let's say we start with our PREVIOUS word. We'll iterate through all the letters and find the best pairing letters. Then we could find the words which maximises the number of best letters:

search-2

Solver

Once we have a way to compute a score for each candidate, what should we use ? Backtracking ? Energy minimising algos ?

what about something like this ? I could iterate through all the letters of my new word, and find the best letters for each position. Then I can find among them which words match the best find-words-slow

Documentation of the current implementation

Everything starts from the getBestLetter function, which gives the most probable letter in the dictionary for a position in the grid:

getbestletters

Now if for each cell we can compute how many words would fit in the best case, then we should fill up the cells with the less candidate words first. And assign them the best letter we could find:

// find the cell we should treat next
const {cell, bestLetters, candidatesPerLetter} = cells.map(cell => {
  const {bestLetters, candidatesPerLetter} = getBestLetter(cell)
  return {bestLetters, candidatesPerLetter, cell}
})
.sort(sortByLessCandidates)

// assign it the best letter
grid[cell.y][cell.x] = bestLetters[0]