This project uses a multimodal large language model to perform Optical Character Recognition (OCR) on images of forms and extracts the information into a structured JSON format.
Follow these steps to set up the project environment.
git clone https://github.com/maxchanhi/OCR-form-to-json.git
cd OCR-form-to-jsonThis project uses uv for package management.
First, install uv:
pip install uvThen, create a virtual environment and install the project dependencies:
uv venv
uv pip sync pyproject.tomlYou need to have Ollama installed to run the multimodal model.
-
Install Ollama: Follow the instructions on the Ollama website.
-
Pull the model: Once Ollama is installed and running, pull the
qwen2.5vlmodel:ollama pull qwen2.5vl
To run the OCR process, execute the fill.py script from the root of the project:
uv run fill.pyThe script will:
- Process all images in the
img/directory. - Use the
qwen2.5vlmodel to extract information based on the template injson_template/. - Save the extracted JSON data into the
result/directory.