DMI developed a simple Flask based API that runs pre trained Optical Character Recognition (OCR) models on provided images and returns the detected text in location based groups.
The OCR Server runs in a Docker container.
- Install Docker Desktop, and start it.
- Clone the OCR Server repository (e.g.
git clone https://github.com/digitalmethodsinitiative/ocr_server.git
) - (Optional) Update or change any settings in the
config.yml
file - In a terminal/command prompt, navigate to the folder in which you just cloned OCR server (the folder that contains the
config.yml
file) - Run
docker build -t ocr_server .
- This will create a Docker image called
ocr_server
and may take a while to download and install necessary packages
- Run
docker run --publish 4000:80 --name ocr_server --detach ocr_server
- This creates a running container of the
ocr_server
image --publish 4000:80
opens port4000
on your machine and connects it to port80
in the container; you may update4000
to any port you wish- Add a restart policy such as
--restart unless-stopped
and the OCR container will restart if host server is rebooted, Docker crashes, etc.
DMI primarily designed the OCR Server to work as a processor with 4CAT. Add the hosted server address (http://wherever:4000/api/detect_text) to 4CAT Settings in the "OCR: Text from images" section and the processor should appear for any dataset of images.
The OCR Server can also be used independently. It is essentially just an API
that can be accessed via python requests
, curl
, or any other framework.
Python for example:
import requests
server = 'http://localhost:4000/'
filename = 'any/dir/to/image.jpg'
with open(filename, "rb") as infile:
api_response = requests.post(server + 'api/detect_text', files={'image': infile})
# To specify a model type, you can use `paddle_ocr` or `keras_ocr` like so
#api_response = requests.post(server + 'api/detect_text', files={'image': infile}, data={'model_type': 'paddle_ocr'})
The api_response
should return a 200
status code and a JSON object containing
the filename
and simplified_text
which consists of a collection
of groupings
and the raw_text
alone.
Currently, the OCR Server has two available models that can be selected.
- PaddleOCR: The PaddleOCR package provides access to a number of different OCR models. We currently only support the english PP-OCRv3 model, but adding support for other languages is possible if there is a desire (and they exist).
- Keras-OCR: The keras-ocr package (Keras OCR Documentation) first detects areas of an image that may contain text with the pretrained Character-Region Awareness For Text (CRAFT) text detection model and then attempts to predict the text inside each area using Keras' implementation of a Convolutional Recurrent Neural Network (CRNN) model for text recognition. Once words are predicted, we developed an algorithm to attempt to sort the text into likely groupings based on locations within the original image.
- View container logs
docker container logs container_name
- Stop running container
docker stop container_name
- Start stopped container
docker start container_name
- Connect to container command line
docker exec -it container_name /bin/bash
- Remove container
docker container rm container_name
Useful to remove then recreate with new parameters (e.g. port mappings) - Remove image
docker image rm image_name:image_tag
Useful if you need to changeDockerfile
and rebuild- Note: must also remove any containers dependent on image; you could alternately create a new image with a different name:tag
- Copy files into container
docker cp path/to/file container_name:/app/path/to/desired/directory/
Can update and change files (e.g.config.py
or other configuration files) Note: may require restarting the container to take effect