Transcripts Alignment - MiM Algorithm

The method allows to align the transcription of a line of text to the related words in the image of the line

Define enviroment

You need to have the anaconda environment manager installed on your computer. If so, run the command conda env create -f environment.yml and acrivate the enciroment: conda activate MiMalign

Preparation

Create a "data/lines" folder which contains the images of the text lines. The folder is organized into subfolders, one for each document.
Create the "data/GT" folder which contains the transcript txt files. The folder is organized into subfolders, one for each document.
Set all the input folders in the file configs.py.

Perform alignment

Run the alignment.py file to align and get the "all_align.als" pickle file. Within the file you can set parameters for the process.
You can fix the alignment algorithm outputs by running the "correction_tool.py" file the tool will display all the words aligned one at a time.
- With the ENTER key you can move to the next word.
- With the BACKSPACE key you go back to the previous word (of the same line)
- With the DEL (or SHIFT+D) key you can delete an alignmnt
- With the SHIFT+n keys you can add a new bb at the current position (after the current bb)
- With the SHIFT+s keys you save the state
- With the SHIFT+q keys you can close the GUI (or just with 'q', it depends on your CV2 version)
At the end of the correction process, the tool shows all the alignments with more than one words. Witha lect click of the mouse you can split the image to obtain a single-word segmentation.

To correct a segmentation fault you can use the mouse: with a click with the left button a new left segmentation boundary is set a right click sets a new right segmentation boundary

finally the tool fixes the alignment file "all_aligns.als" and generates in the time folder a file where the total time spent on the correction is reported

The process also measures alignment performance: a file will be saved in the Performance folder where the total number of alignments and the number of alignments that did not need correction are shown
run the file crop_all_words.py to generate all the images of the obtained words

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
assets/font		assets/font
.gitignore		.gitignore
alingment.py		alingment.py
configs.py		configs.py
convert_RGB.py		convert_RGB.py
correction_tool.py		correction_tool.py
crop_all_words.py		crop_all_words.py
environment.yml		environment.yml
image_utils.py		image_utils.py
readme.md		readme.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcripts Alignment - MiM Algorithm

Define enviroment

Preparation

Perform alignment

About

Releases

Packages

Languages

Natural-Computation-Lab/MiMtranscriptAligner

Folders and files

Latest commit

History

Repository files navigation

Transcripts Alignment - MiM Algorithm

Define enviroment

Preparation

Perform alignment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages