-
Notifications
You must be signed in to change notification settings - Fork 1
Automatic generation
Being able to generate a complete, valid grid automatically. It should use a provided dictionary and generate a valid grid.
In this video, the guy explains how he is solving Wordle thanks to information theory. I wonder if I could adapt his approach to my problem.
Let's figure it out !
Let's say you have this grid:
And you want to add the next word. You want to find a word that will let you the most possible next words. Something like this:
Or like This:
What we want when trying a candidate for the next word, is to know if it is good or not. Let's try to compute a score for a candidate word. 3Blue1Brown's idea is to compute the entropy for each word, and pick the word with the highest entropy. If we manage to measure the entropy for our candidates, ie measuring how many words could fit in the grid after this next word is placed, we just need to pick among the lowest entropy words.
Let's say we are trying to compute the entropy for the word next in this configuration:
For each letter of NEXT, there are few things that can happen. We could decide to end the vertical word right after, or make it start right before, or not end it yet:
So, depending on the situation, we are either looking for a word ending by EN, or starting EN, or containing EN. I think we need a data structure storing for each pair of letter and for each index within a word, which words in the dictionary exist.
Or even store the words by length. Like this
Then, while computing the score of a candidate, like the word NEXT, I could iterate through all the letters and find how many words could fit in the veritcal direction.
It seems like a good start. It does not seem as good though as 3Blue1Brown's distribution. He bakes everything beforehand for each possible, and then can use it fast. If I want to get a similar chart with the entropy of each possible pattern he could encounter along the game. Which is already huge. Here we need to compute something not exhaustive, less accurate, but still good enough. But what ?