OpenSource version of Shitsu-chan
- Create a database
- Configure
project.config.json
- Load a backup file from
/assets/dump
into your Postgres database - Run
npm start
- .. and you are done !
The user thinks of a character without telling the computer who, the computer then makes guesses by asking questions efficiently and if the character is not in the knowledge base, the user can contribute by submitting the character they had in mind.
First thing we know is that, initially, each character have an equal chance of being chosen.
Second thing we would like to know is how do we calculate the probability that a hypothesis H is true given the available evidences?
A simple way to implement this idea mathematically is to use the Bayes’ theorem. In the general case, if H and E are two dependent events then,
Read as “probability of hypothesis H given the evidence E”.
We know that for any event X,
Which is further generalized with the law of total probability
We can then write the Bayes’ theorem in a much more compact way by replacing the denominator:
For a more general case in which we provide more than a single evidence we have,
Simply because for any event X, Y and Z:
Since we need to implement the idea that when the user’s response matches ours then
Same thing as above but the main difference is that we have to take into account all other possibilities.
From
Read as "Probability of the hypothesis H given the evidences
- H: hypothesis
character -
: obvervable evidence, defined implicitly by the user′s response -
: correct answer given H, a real number between 0 and 1. -
(normalized distance for all ) -
: set of all characters
For our "AI" to look more "AI-ish", we have to solve this optimization problem:
"Maximize the hypothesis’s probability the quickiest possible." i.e. "Ask the fewest number of question as possible"
First of all, how are we going to quantify information? We have Information Theory for backing us up!
For example if
The lower the probability the bigger the information value.
Here is an interesting example, almost 70% of the population has black hair whereas only 2% of the population has red hair. If we were to ask the fewest possible questions on how a person looks like then asking if the person in question has red hair will bring us much more information because if so then congratulations! We have just reduced our search space to 2% of the population.
In the second case, we have reduced our search space to almost 1/50 of the initial size of the population.
Let
Which can be read as “Any character
In order to approximate
In our model, we shall use:
This quantity computes a proportion hence it is the perfect candidate.
First, let’s only consider the questions that correspond to at least 1 character.
If