Skip to content

ryanlinjui/menu-text-detection

Repository files navigation

Menu Text Detection System

Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM.

Gradio Space Demo Hugging Face Models & Datasets

demo.mp4

🚀 Features

Overview

Currently supports the following information from menu images:

  • Restaurant Name
  • Business Hours
  • Address
  • Phone Number
  • Dish Information
    • Name
    • Price

For the JSON schema, see tools directory.

Supported Methods to Extract Menu Information

Fine-tuned E2E model and Training metrics

LLM Function Calling

  • Google Gemini API
  • OpenAI GPT API

💻 Training / Fine-Tuning

Setup

Use uv to set up the development environment:

uv sync

or use pip install -r requirements.txt if it has any problems

Training Script (Datasets collecting, Fine-Tuning)

Please refer train.ipynb. Use Jupyter Notebook for training:

uv run jupyter-notebook

For VSCode users, please install Jupyter extension, then select .venv/bin/python as your kernel.

Run Demo Locally

uv run python app.py

About

Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published