Skip to content

cloonix/excalidraw-ocr

Repository files navigation

Excalidraw OCR

Docker Build License: MIT

Extract text from handwritten images and Excalidraw drawings using AI vision models.

Quick Start

Using Docker (Recommended)

# Pull the pre-built image
docker pull ghcr.io/cloonix/excalidraw-ocr:latest

# Extract text from an image (using .env file)
docker run --rm -v ./data:/data \
  --env-file .env \
  ghcr.io/cloonix/excalidraw-ocr:latest \
  python ocr.py /data/image.png

# Or pass API key directly
docker run --rm -v ./data:/data \
  -e OPENAI_API_KEY=your_key_here \
  ghcr.io/cloonix/excalidraw-ocr:latest \
  python ocr.py /data/image.png

# Extract text from Excalidraw drawing
docker run --rm -v ./data:/data \
  --env-file .env \
  ghcr.io/cloonix/excalidraw-ocr:latest \
  python excalidraw_ocr.py /data/drawing.excalidraw.md

# Watch mode - automatically process new files
docker run -d --name ocr-watch \
  -v ./watch:/watch \
  --env-file .env \
  ghcr.io/cloonix/excalidraw-ocr:latest \
  python excalidraw_ocr.py /watch -w

Local Installation

# Install dependencies
pip install -r requirements.txt
npm install
./install_cairo.sh  # For Excalidraw support

# Configure API key
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY or OPENROUTER_API_KEY

# Run OCR
python ocr.py image.png
python ocr.py --clipboard  # From clipboard

# Run Excalidraw OCR
python excalidraw_ocr.py drawing.excalidraw.md
python excalidraw_ocr.py folder/ -w  # Watch mode

Features

  • 📝 Extract text from handwritten images
  • 🎨 Extract text from Excalidraw drawings
  • 📋 Clipboard support (copy image → extract text → copy result)
  • 👁️ Watch mode for continuous processing
  • 🐳 Docker support with pre-built images
  • 🔄 Supports OpenAI and OpenRouter APIs
  • 💾 Smart caching to avoid reprocessing
  • 🌍 Multi-platform: x86_64 and ARM64

API Keys

Get an API key from:

Set in .env file:

OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4o

# OR

OPENROUTER_API_KEY=your_key_here
OPENROUTER_MODEL=google/gemini-flash-1.5

Docker Compose (Watch Mode)

The docker-compose.yml is configured for watch mode - continuously monitoring a folder for new Excalidraw files:

# Setup
make setup        # Creates directories and .env file

# Start watch mode
docker compose up -d
docker compose logs -f   # View logs
docker compose down      # Stop

# Or use make targets
make watch-start  # Start watch mode
make watch-logs   # View logs
make watch-stop   # Stop watch mode

For one-shot processing, use docker run directly (see Quick Start above).

Command Line Options

General OCR (ocr.py)

python ocr.py image.png                           # Basic usage
python ocr.py --clipboard                         # From clipboard
python ocr.py image.png -o output.txt             # Save to file
python ocr.py image.png -m anthropic/claude-3.5-sonnet  # Use specific model
python ocr.py --list-models                       # Show available models

Excalidraw OCR (excalidraw_ocr.py)

python excalidraw_ocr.py drawing.excalidraw.md    # Basic usage (auto-saves as drawing.md)
python excalidraw_ocr.py drawing.excalidraw.md -o output.txt  # Custom output
python excalidraw_ocr.py drawing.excalidraw.md -c # Copy to clipboard
python excalidraw_ocr.py folder/ -w               # Watch mode (15 min delay by default)
python excalidraw_ocr.py folder/ -w --no-delay    # Watch mode (immediate processing)
python excalidraw_ocr.py folder/ -w --delay 30    # Watch mode (30 min delay)
python excalidraw_ocr.py drawing.excalidraw.md -f # Force reprocess (ignore cache)

Watch mode stabilization delay: By default, watch mode waits 15 minutes after the last file modification before processing. This prevents processing files that are being actively edited (e.g., during meetings). Use --no-delay for immediate processing or --delay MINUTES to customize.

Recommended Models

Fast & Cheap:

  • google/gemini-flash-1.5 (default for OpenRouter)
  • gpt-4o-mini (OpenAI)

High Quality:

  • gpt-4o (default for OpenAI)
  • anthropic/claude-3.5-sonnet

Troubleshooting

"OPENAI_API_KEY not found"

  • Create .env file with your API key

"cairosvg not available" (Excalidraw only)

  • Run ./install_cairo.sh
  • Or install manually: brew install cairo pkg-config (macOS) or sudo apt-get install libcairo2-dev pkg-config python3-dev (Ubuntu)

"No text extracted"

  • Try a better model: --model anthropic/claude-3.5-sonnet
  • Check image quality
  • Verify API credits

License

MIT License - See LICENSE

Contributing

Issues and pull requests welcome!

About

Python OCR script/container using AI to process excalidraw drawings.

Resources

License

Stars

Watchers

Forks

Packages