optical-character-recognition

A simple implementation of an optical character recognition problem using SVM. The main goal of this project is to recognize chacarters of lisence plates from a given database.

Related work

This project is a simplified implementation of an OCR (optical character recognition) architecture proposed by Gonçalves et al. (2016), which proposes a solution to recognize license plates in real-time using temporal redundancy.

Sequence of tasks performed by the proposed approach (Gonçalves et al., 2016).

Database

The database used is private, so it's not possible to provide the files in this repository. However, all you need to know about the database used in this project is:

Images and notes

Each image have a related text file, which describes the bounding boxes related to the lisence plate recognized in the image and each character of the plate. The note file also have the real value of the characters of the lisence plate. Example:

text: XXX-9999
position_plate: 568 672 99 37
position_chars:
	char0: 573 687 12 18
	char1: 585 687 12 17
	char2: 597 687 12 18
	char3: 614 687 12 17
	char4: 627 687 12 17
	char5: 639 687 11 17
	char6: 651 687 12 17

Directory structure

The database was divided in three sets: training, test and validation. The images was grouped by folders. These grouped images represent a video clip, which each image represents a frame of the video. These group of images will be used to simulate the temporal redundancy behaviour.

database
├ training
| ├ Track1
| | ├ Track1[01].png
| | ├ Track1[01].txt
| | ├ ...
| | ├ Track1[M].png
| | └ Track1[M].txt
| ├ ...
| └ TrackN
|   └ ...
├ test
| └ ...
└ validation
  └ ...

Technologies used

This project was made using Python language. The libraries used are:

Scikit-learn to get a SVM implementation;
Scikit-image to get a HOG describer implementation;
OpenCV to read and handle images;
NumPy to make some numeric transformations necessary for SVM input;
Matplotlib to plot some graphs in order to analyse the results.

Development

This project is a simplified implementation of the OCR architecture proposed by Gonçalves et al. (2016); more particularly, related to the character recognition and temporal redundancy aggregation steps. The information given for the used database allows us to jump the steps related to vehicle detection, lisence plate detection and characters segmentation.

Support Vector Machines (SVM) was the model used to predict the character values. I've also used the Radial Basis Function (RBF) kernel, which is the State-of-Art kernel for OCR problems. To describe the images, I've used the Histogram of Oriented Gradients (HOG) describer.

To work with multiple classes, I've used the One-against-all composition. To do so, one SVM is created to each classes of the problem (in this case, the letters [a to z] and the numbers [0 to 9]). On the training step, these SVMs receive items from classes 1 or 0, where 1 means that the given item is from the same class that this SVM is responsible for, and 0 otherwise. On the forecasting, the input image is provided to all SVMs, and the SVM with the highest answer value has the chosen class.

About the temporal redundancy agregation, the same lisence plate is recognized multiple times. The final value is given by a voting process from these multiple results.

Results

I've reached a precision of 99,7%, using this approach and the given database as input. To simply describe the experiment, 5523 characters (789 images) was used on the training step, and 5628 characters (804 images) was used on the test step. Although, 5613 images was predicted correctly, against 15 wrong predictions. The image below describes the confusion matrix got from the experiment.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
arch.png		arch.png
confusion.png		confusion.png
program.py		program.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

optical-character-recognition

Related work

Database

Images and notes

Directory structure

Technologies used

Development

Results

About

Releases

Packages

Languages

tchesa/optical-character-recognition

Folders and files

Latest commit

History

Repository files navigation

optical-character-recognition

Related work

Database

Images and notes

Directory structure

Technologies used

Development

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages