Solves any Sudoku puzzle and displays the solution in real time.
- Digit classification is performed through the use of a convolutional neural network.
- Computer vision techniques are used in order to locate and extract the required information from the source image.
Figure 1: A demonstration of the program solving a Sudoku puzzle.
$ git clone https://github.com/TristanBester/AR_sudoku
$ cd AR_sudoku
$ pip3 install -r requirements.txt
$ cd solver
$ python3 ar_solver.py
Image segmentation is performed in order to simplify the image prior to analysis. Simple as well as adaptive thresholding is used in order to convert the source image into a binary image.
After a binary image has been generated the position of the Sudoku grid can be calculated. The result of this calculation will be used in order to extract the digits from the Sudoku grid. Both of the two basic operators of mathematical morphology are employed during the position calculation, namely erosion and dilation. These operators, when used with specific kernels, allow the grid lines to be extarcted from the image.
The position of each of the elements in the grid is used in order to extract the regions of the source image containing the digits.
A convolutional neural network is used in order the predict the values of each of the digits extracted from the image. These values are used to reconstruct the Sudoku grid in memory.
Backtracking is used in order to solve the puzzle.
As the position of all of the elements in the grid has already been calculated, this information is simply used in order to overlay the solution onto the source image before it is displayed to the user.
Figure 2: Image segmentation.
Figure 3: Grid detection using morphology.
Figure 4: Three digits extracted from the image.
Figure 5: The solution to the puzzle.
The CNN used for digit classification was created and trained by me. Initially I attempted to train the model on the MNIST dataset, however the model performed extremely poorly on computer generated digits. The computer generated digits were augmented through the use of morphology (erosion) in an attempt to increase the resemblance between the digits and the handwritten numbers the model was trained on. However, this did not yield the required accuarcy for the model to be used.
A subset of the Chars74K dataset was used in order to train the CNN used in this project. This dataset provides slightly more than ten thousand labeled computer generated digit images. The dataset also contains many other images, however these were not used.
-
The Chars74K dataset can be found here: http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/
-
The following article provides a fantastic introduction to the computer vision concepts required in order to implement this project. Note, the aritcle is simply and introduction an does not cover all required skills. Article: https://medium.com/coinmonks/a-box-detection-algorithm-for-any-image-containing-boxes-756c15d7ed26