Skip to content

Fall2018-Project4-sec1--sec1proj4_grp6 created by GitHub Classroom

Notifications You must be signed in to change notification settings

TZstatsADS/Fall2018-Project4-sec1-grp6

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project: OCR (Optical Character Recognition)

image

Term: Fall 2018

  • Team # 6

  • Team members

    • team member 1: Yi Lin
    • team member 2: Yang Xing
    • team member 3: Chuqiao Rong
    • team member 4: Hongru Liu
    • team member 5: Liu Han
  • Project summary: In this project, we created an OCR post-processing procedure to enhance Tesseract OCR output. The detection method is n-gram and correction method is probability scoring. Error detection is performed by comparing all possible bigrams of all words in Tesseract with positional binary matrix constructed from Ground Truth. If any bigrams of a word do not appear in the corresponding positional binary matrix then the word is classified as error. Error correction is performed by using Baysian probability to find the correction type with highest prior probability multiplied by probability of it being a typo given the correction. Different methods of estimations such as MLE and ELE are performed. Performance evaluation is to compare the correction accuracy of different methods of estimations by calculating the proportion of errors before and after correction.

Contribution statement: All team members approve our work presented in this GitHub repository including this contributions statement. The contribution for each team member is following:

  • Hongru Liu: Error detection using paper D2, Error detection using paper C4, performence evaluation, and update the presentation slides
  • Chuqiao Rong: Error correction using papaer C4
  • Yang Xing: Error correction using papaer C4
  • Yi Lin: Error correction using papaer C4, write README files and prepare the presentation slides
  • Han Liu: Helping prepare for the ppt and debugging.

Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.

proj/
├── lib/
├── data/
├── doc/
├── figs/
└── output/

Please see each subfolder for a README file.

About

Fall2018-Project4-sec1--sec1proj4_grp6 created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages