Note: This documentation expects you to be familiar with compiling software on your operation system.
Use the same tools for building tesseract as you used for building leptonica.
To install Tesseract 4.x you can simply run the following command on your Ubuntu 18.xx bionic:
sudo apt install tesseract-ocr
If you wish to install the Developer Tools which can be used for training, run the following command:
sudo apt install libtesseract-dev
The following instructions are for building on Linux, which also can be applied to other UNIX like operating systems.
- A compiler for C and C++: GCC or Clang
- GNU Autotools: autoconf, automake, libtool
- pkg-config
- Leptonica
- libpng, libjpeg, libtiff
If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04):
sudo apt-get install g++ # or clang++ (presumably)
sudo apt-get install autoconf automake libtool
sudo apt-get install pkg-config
sudo apt-get install libpng-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev
if you plan to install the training tools, you also need the following libraries:
sudo apt-get install libicu-dev
sudo apt-get install libpango1.0-dev
sudo apt-get install libcairo2-dev
You also need to install Leptonica. Ensure that the development headers for Leptonica are installed before compiling Tesseract.
Tesseract versions and the minimum version of Leptonica required:
Tesseract | Leptonica | Ubuntu |
---|---|---|
4.00 | 1.74.2 | Ubuntu 18.04 |
3.05 | 1.74.0 | Must build from source |
3.04 | 1.71 | Ubuntu 16.04 |
3.03 | 1.70 | Ubuntu 14.04 |
3.02 | 1.69 | Ubuntu 12.04 |
3.01 | 1.67 |
One option is to install the distro's Leptonica package:
sudo apt-get install libleptonica-dev
but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.
The sources are at https://github.com/DanBloomberg/leptonica . The instructions for building are given in Leptonica README.
Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at Stackoverflow is very helpful.
Please follow instructions in Compiling--GitInstallation
Also read Install Instructions
Tesseract can be configured to install anywhere, which makes it possible to install it without root access.
To install it in $HOME/local:
./autogen.sh
./configure --prefix=$HOME/local/
make
make install
To install it in $HOME/local using Leptonica libraries also installed in $HOME/local:
./autogen.sh
LIBLEPT_HEADERSDIR=$HOME/local/include ./configure \
--prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
make
make install
In some system, you might also need to specify the path to the pkg-config
before running the configure
script:
export PKG_CONFIG_PATH=$HOME/local/lib/pkgconfig
- Download the data file(s) for the language(s) you are interested in.
- Move it to the
tessdata
directory (e.g. 'mv tessdata $TESSDATA_PREFIX' if definedTESSDATA_PREFIX
)
You can also use:
export TESSDATA_PREFIX=/some/path/to/tessdata
to point to your tessdata directory (example: if your tessdata path is '/usr/local/share/tessdata' you have to use 'export TESSDATA_PREFIX='/usr/local/share/').
!!! IMPORTANT !!! To use Tesseract in your application (to include tess or to link it into your app) see this very simple example.
- Download the latest SW (Software Network
https://software-network.org/
) client fromhttps://software-network.org/client/
. - Run
sw build org.sw.demo.google.tesseract.tesseract-master
.
- Download the latest CPPAN (C++ Archive Network
https://cppan.org/
) client fromhttps://cppan.org/client/
. - Run
cppan --build pvt.cppan.demo.google.tesseract.tesseract-master
.
- Setup Vcpkg the Visual C++ Package Manager.
- Run
vcpkg install tesseract:x64-windows
for 64-bit. Use --head for the master branch.
To build a self-contained tesseract.exe
executable (without any DLLs or runtime dependencies), use Vcpkg as above with the following command:
vcpkg install tesseract:x64-windows-static
for 64-bitvcpkg install tesseract:x86-windows-static
for 32-bit
Use --head for the master branch. It may still require one DLL for the OpenMP runtime, vcomp140.dll
(which you can find in the Visual C++ Redistributable 2015).
Today it is possible to build a full set of tess training tools on Windows with Visual Studio. The latest versions (Win10, VS2015/VS2017) are preferable.
To do this:
- Download the latest CPPAN (C++ Archive Network
https://cppan.org/
) client fromhttps://cppan.org/client/
. - Run
cppan --build pvt.cppan.demo.google.tesseract-master
.
For development purposes of Tesseract itself do the next steps:
- Download and install Git, CMake and put them in PATH.
- Download the latest SW (Software Network
https://software-network.org/
) client fromhttps://software-network.org/client/
. SW is a source package distribution system. - Add SW client to PATH.
- If you have a release archive, unpack it to
tesseract
dir.
If you're using master branch (4.0) run
git clone https://github.com/tesseract-ocr/tesseract tesseract
-
Run
cd tesseract mkdir build && cd build cmake ..
-
Build a solution (
tesseract.sln
) in your Visual Studio version. If you want to build and install from command line (e.g. Release build) you can use this command:
cmake --build . --config Release --target install
If you want to install to other directory that C:\Program Files (you will need admin right for this), you need to specify install path during configuration:
cmake .. -G "Visual Studio 15 2017 Win64" -DCMAKE_INSTALL_PREFIX=inst
For development purposes of training tools after cloning a repo from previous paragraph, run
sw build
You'll see a solution link appeared in the root directory of Tesseract.
For development purposes of Tesseract itself do the next steps:
- Download and install Git, CMake and put them in PATH.
- Download the latest CPPAN (C++ Archive Network
https://cppan.org/
) client fromhttps://cppan.org/client/
. CPPAN is a source package distribution system. Add CPPAN client in PATH too. (VS2015 redist is required.) - If you have a release archive, unpack it to
tesseract
dir.
If you're using master branch (4.0) run
git clone https://github.com/tesseract-ocr/tesseract tesseract
-
Run
cd tesseract cppan mkdir build && cd build cmake ..
-
Build a solution (
tesseract.sln
) in your Visual Studio version. If you want to build and install from command line (e.g. Release build) you can use this command:
cmake --build . --config Release --target install
If you want to install to other directory that C:\Program Files (you will need admin right for this), you need to specify install path during configuration:
cmake .. -G "Visual Studio 15 2017 Win64" -DCMAKE_INSTALL_PREFIX=inst
For development purposes of training tools after cloning a repo from previous paragraph, run
cppan --build .
You'll see a solution link appeared in the root directory of Tesseract.
If you're building with sw+cmake, run cmake as follows:
mkdir win64 && cd win64
cmake .. -G "Visual Studio 14 2015 Win64"
If you're building with sw run sw generate
, it will create a solution link for you (not yet implemented!).
If you're building with cppan+cmake, run cmake as follows:
mkdir win64 && cd win64
cppan ..
cmake .. -G "Visual Studio 14 2015 Win64"
If you're building with cppan, edit cppan.yml and uncomment this line:
#generator: Visual Studio 14 2015 Win64 -> generator: Visual Studio 14 2015 Win64
Then run cppan --generate .
- it will create a solution link for you.
(For VS2017, use '15 2017' instead of '14 2015'.)
If you have Visual Studio 2015, checkout the https://github.com/peirick/VS2015_Tesseract repository for Visual Studio 2015 Projects for Tessearct and dependencies. and click on build_tesseract.bat. After that you still need to download the language packs.
Have a look at blog How to build Tesseract 3.03 with Visual Studio 2013.
For tesseract-ocr 3.02 please follow instruction in Visual Studio 2008 Developer Notes for Tesseract-OCR.
Download these packages from the Downloads Archive on SourceForge page:
tesseract-3.01.tar.gz
- Tesseract sourcetesseract-3.01-win_vs.zip
- Visual studio (2008 & 2010) solution with necessary librariestesseract-ocr-3.01.eng.tar.gz
- English language file for Tesseract (or download other language training file)
Unpack them to one directory (e.g. tesseract-3.01
). Note that tesseract-ocr-3.01.eng.tar.gz
names the root directory 'tesseract-ocr'
instead of 'tesseract-3.01'
.
Windows relevant files are located in vs2008 directory (e.g. 'tesseract-3.01\vs2008'). The same build process as usual applies: Open tesseract.sln with VC++Express 2008 and build all (or just Tesseract.) It should compile (in at least release mode) without having to install anything further. The dll dependencies and Leptonica are included. Output will be in tesseract-3.01\vs2008\bin (or tesseract-3.01\vs2008\bin.rd or tesseract-3.01\vs2008\bin.dbg based on configuration build).
For Mingw+Msys have a look at blog Compiling Leptonica and Tesseract-ocr with Mingw+Msys.
Download and install MSYS2 Installer from https://msys2.github.io/
The core packages groups you need to install if you wish to build from PKGBUILDs are:
- base-devel for any building
- msys2-devel for building msys2 packages
- mingw-w64-i686-toolchain for building mingw32 packages
- mingw-w64-x86_64-toolchain for building mingw64 packages
To build the tesseract-ocr release package, use PKGBUILD from https://github.com/Alexpux/MINGW-packages/tree/master/mingw-w64-tesseract-ocr
To build on Cygwin have a look at blog How to build Tesseract on Cygwin.
Tesseract as well as the training utilities for 3.04.00 onwards are available as Cygwin packages.
Tesseract specific packages to be installed:
tesseract-ocr 3.04.01-1
tesseract-ocr-eng 3.04-1
tesseract-training-core 3.04-1
tesseract-training-eng 3.04-1
tesseract-training-util 3.04.01-1
Mingw-w64 allows building 32- or 64-bit executables for Windows. It can be used for native compilations on Windows, but also for cross compilations on Linux (which are easier and faster than native compilations). Most large Linux distributions already contain packages with the tools need for a cross build. Before building Tesseract, it is necessary to build some prerequisites.
For Debian and similar distributions (e. g. Ubuntu), the cross tools can be installed like that:
# Development environment targeting 32- and 64-bit Windows (required)
apt-get install mingw-w64
# Development tools for 32- and 64-bit Windows (optional)
apt-get install mingw-w64-tools
These prerequisites will be needed:
- libpng, libtiff, zlib (binaries for Mingw-w64 available as part of the GTK+ bundles)
- libicu
- liblcms2
- openjpeg
- leptonica
Typically a package manager like Fink, Homebrew or MacPorts is needed in addition to Apple's Xcode.
Xcode and the related command line tools provides the compiler (llvm-gcc
) and linker, but also libraries like zlib
. The package manager provides free software packages which are not part of Xcode.
The Xcode Command Line Tools can be installed by running xcode-select --install
.
Note that Tesseract 4 can be built with OpenMP support, but that requires additional installations.
Fink (as of 2017-04) neither provides Leptonica nor the packages needed for the Tesseract training tools, so it cannot be recommended for building Tesseract.
Install OpenMP:
sudo port install libomp
The following method which gets, compiles and installs OpenMP manually should no longer be needed:
# Install cmake if it is not available.
sudo port install cmake
git clone https://github.com/llvm-mirror/openmp.git
cd openmp
mkdir build
cd build
cmake ..
make
sudo make install
sudo port install autoconf \
automake \
libtool \
pkgconfig \
leptonica
Compilation itself relies on the Autotools suite:
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure
make
sudo make install
If you want support for multithreading, you have to install OpenMP first (see above)
and tell the compiler and linker how to activate OpenMP support.
This is done by adding that information to the options for configure
:
./configure CXXFLAGS="-Xpreprocessor -fopenmp -I/opt/local/include/libomp -Wall -O2" LDFLAGS=-L/opt/local/lib/libomp LIBS=-lomp
If compilation fails at the make
command, with libtool
erring on missing instructions, you may be building with MacPort's g++
compiler, with known issues. The community recommends to use clang
, but a workaround for g++
is to re-configure the build:
./configure CXXFLAGS=-Wa,-q
And then proceed with make
.
In the above training tools are not installed. You can install not only Tesseract but also training tools like below.
sudo port install cairo pango
sudo port install icu +devel
git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
./configure
make training
sudo make install training-install
# Packages which are always needed.
brew install automake autoconf libtool
brew install pkgconfig
brew install icu4c
brew install leptonica
# Packages required for training tools.
brew install pango
# Optional packages for extra features.
brew install libarchive
# Optional package for builds using g++.
brew install gcc
As of January 2017, the clang builds but OpenMP will only use a single thread, potentially reducing performance. If you really need OpenMP, install and use gcc.
git clone https://github.com/tesseract-ocr/tesseract/
cd tesseract
./autogen.sh
mkdir build
cd build
# Optionally add CXX=g++-8 to the configure command if you really want to use a different compiler.
../configure PKG_CONFIG_PATH=/usr/local/opt/icu4c/lib/pkgconfig:/usr/local/opt/libarchive/lib/pkgconfig:/usr/local/opt/libffi/lib/pkgconfig
make -j
# Optionally install Tesseract.
sudo make install
# Optionally build and install training tools.
make training
sudo make training-install
For cross-compiling see discussion in issue 2334. You need to specify target this way:
./configure CXX="g++ --target=arm-apple-darwin64"
Tesseract can be built for Android as a static command-line executable tesseract
, or you can use Java binding to work with libtess from your Android app.
Currently, the easiest build method can be found in a tess-two fork. This fork contains both tesseract and leptonica sources, so that it is enough to download the repository. To build the command-line executable, you don't need Android SDK or Android Studio, only install Android NDK (r.20 has been tested) and run the ndk-build
command, e.g.:
ndk-build -C tess-two-git/tess-two tesseract APP_ABI=arm64-v8a
The 4.1 branch is available, too. Note that performance may be significantly different:
> adb shell time tess3 --tessdata-dir tessdata3 eurotext.png txt3
Tesseract Open Source OCR Engine v3.05.00 with Leptonica
0m05.95s real 0m05.77s user 0m00.17s system
> adb shell time tess4 --tessdata-dir tessdata4 eurotext.png txt4
Tesseract Open Source OCR Engine v4.1.0 with Leptonica
0m59.07s real 0m58.56s user 0m00.45s system
> adb shell time tess4 --tessdata-dir tessdata3 eurotext.png txt42
Tesseract Open Source OCR Engine v4.1.0 with Leptonica
0m05.61s real 0m05.37s user 0m00.23s system
Another method of compiling is using the project Building for Android with Docker, which at the time of writing can produce shared libraries for the following versions and architectures:
Arch \ Version | 3.02.02 | 3.05.02 | 4.0.0 | 4.1.0 |
---|---|---|---|---|
armv7-a | ✔ | ✔ | ✔ | ✔ |
arm64-v8a | ✖ | ✔ | ✔ | ✔ |
x86 | ✔ | ✔ | ✔ | ✔ |
Compilation of dependent libraries, leptonica and tiff, are included and handled as well.
- To fix this error
./configure: line 4237: syntax error near unexpected token `-mavx,'
./configure: line 4237: `AX_CHECK_COMPILE_FLAG(-mavx, avx=1, avx=0)'
ensure that autoconf-archive
is installed. Don't forget to run ./autogen.sh
after the installation of autoconf-archive
. Note this error happens often under CentOS, where autoconf-archive
is missing and no package is available. Some projects help with installing.
The latest code from GitHub does not require autoconf-archive
.
-
If configure fails with such error "configure: error: Leptonica 1.74 or higher is required." Try to install libleptonica-dev package.
-
If you are sure you have installed leptonica (for example in /usr/local) then probably pkg-config is not looking at your install folder (check with
pkg-config --variable pc_path pkg-config
).
A solution is to set PKG_CONFIG_PATH : example :PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
- On some systems autotools does not create m4 directory automatically (giving the error: "configure: error: cannot find macro directory 'm4'").
In this case you must create m4 directory (mkdir m4
), and then rerun the above commands starting with ./configure.