forked from ektavats/OCR-tesseract-SE
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathHow_to_install_tesseract_windows.txt
49 lines (42 loc) · 2.13 KB
/
How_to_install_tesseract_windows.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Beginners guide to setting up tesseract, tailored for non-technical users!
Download the code (as zip file) from https://github.com/ektavats/OCR-tesseract-SE/tree/main
Save it on your computer, working folder.
Copy some sample images in this folder, if any.
Follow these instructions for setting up the OCR code.
Windows OS:
1. Download anaconda: https://www.anaconda.com/products/individual
2. Install anaconda (exe in Downloads folder, follow default options)
3. Open start menu, search for anaconda power shell, run as administrator.
4. conda create --name pytess python=3.8 pip numpy pandas matplotlib (press y, enter)
5. conda activate pytess
6. Open anaconda navigator, from drop down menu select “Applications”, then pytess.
7. Find Jupyter lab and click install
8. pip install pytesseract
9. Install tesseract. Go to https://tesseract-ocr.github.io/tessdoc/Installation.html
- Navigate to Windows, then UB Mannheim link.
- Go to https://github.com/UB-Mannheim/tesseract/wiki
- Select tesseract-ocr-w64-setup....(64 bit)
- run the exe (default settings)
10. Add tesseract path to system environment variable:
- go to start menu, search for edit system environment variables
- go to environment variables
- system variables, path, new, paste C:\Users\name\AppData\Local\Programs\Tesseract-OCR (Path to tesseract)
- Press ok.
11. type: tesseract --list-langs (after restarting terminal)
12. go to https://github.com/tesseract-ocr/tessdata
- download more language models (download as zip file)
13. go to C:\Users\name\AppData\Local\Programs\Tesseract-OCR\tessdata
- paste the language models
14. type: tesseract --list-langs (more languages now?)
15. pip install opencv-python
16. type: jupyter lab (enter)
17. open OCR_singleImage.ipynb. Run (two arrow buttons).
NOTE:
To run the code after the set up, or for future use:
1. Open anaconda powershell, run as admin
2. conda activate pytess
3. cd your/code/folder/path
4. jupyter lab
NOTE:
For details on more languages: https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
For adding more languages, simply use e.g. "swe + eng + deu" (i.e. Swedish, English, German.)