Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM.
demo.mp4
Currently supports the following information from menu images:
- Restaurant Name
- Business Hours
- Address
- Phone Number
- Dish Information
- Name
- Price
For the JSON schema, see tools directory.
- Donut (Document Parsing Task) - Base model by Clova AI (ECCV ’22)
- Google Gemini API
- OpenAI GPT API
Use uv to set up the development environment:
uv sync
or use
pip install -r requirements.txt
if it has any problems
Please refer train.ipynb
. Use Jupyter Notebook for training:
uv run jupyter-notebook
For VSCode users, please install Jupyter extension, then select
.venv/bin/python
as your kernel.
uv run python app.py