Extracting tables from images is a challenging task that involves not just recognizing the text (OCR), but also accurately reconstructing the table's structure — including rows, columns, headers, merged cells, and nested sub-tables. Traditional OCR systems often struggle with preserving this spatial layout, especially when tables have complex formats or multiple nested structures.
To address this, this project evaluates the performance of various Large Language Models (LLMs) in extracting markdown-formatted tables from table images. The workflow includes:
- Sending table images to different LLMs using a variety of carefully designed prompts.
- Receiving and parsing the Markdown table outputs generated by the models.
- Comparing these outputs with ground-truth Markdown representations of the original tables.
This evaluation is automated and powered by Promptfoo, a framework for benchmarking LLM responses. The models are assessed based on three key criteria:
- Content Accuracy – Whether the correct cell values are extracted.
- Positional Correctness – Whether values appear in the right row and column.
- Structural Integrity – Whether the layout of the table (rows, columns) match the ground truth.
By leveraging Promptfoo, this project provides insights into how well different LLMs understand and reconstruct table data from images — a crucial step in reliable OCR-based data extraction pipelines.
To run this, get an OpenRouter API key Then run:
git clone https://github.com/Nitin399-maker/Table_OCR.git
cd Table_OCR
$env:OPENROUTER_API_KEY=...
npx promptfoo eval -c promptfooconfig.yaml --output output/result.json --no-cache