Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@

**AI-powered document data extraction toolkit**

Extract structured data from documents (invoices, receipts, forms) using Claude's vision API. Easily integrate into your Python applications with flexible input options and built-in cost tracking.
Extract structured data from documents (invoices, receipts, forms) using any supported provider. Easily integrate into your Python applications with flexible input options and built-in cost tracking.

> ⚠️ **Early Development**: This project is in active development. Core functionality is working, but many features are still being built.

## What Works Now

- ✅ **Vision API Integration**: Extract data from images (.jpg, .png, .gif, .webp)
- ✅ **Flexible Input**: Accepts file paths, bytes, or file-like objects (like PIL, requests)
- ✅ **Cost Tracking**: Built-in monitoring and limits for API usage
- ✅ **Cost Tracking**: Built-in monitoring and limits for API usage (needs to be improved)
- ✅ **Structured Output**: Returns Pydantic-validated data models that you can define
- 🚧 **Multi-strategy Extraction**: Cost-optimized cascade to reduce api calls (planned)
- **Providers**: Currently supports Anthropic, OpenAI and local with Ollama

## Quick Start

Expand All @@ -22,7 +22,7 @@ uv sync

# Setup environment
cp .env.template .env
# Add your Anthropic API key to .env
# Add your Anthropic or OpenAI API key to .env

# Run a test
uv run python example.py
Expand Down Expand Up @@ -69,11 +69,12 @@ make test-cov
## Requirements

- Python 3.13
- Anthropic API key (for Claude vision API)
- Anthropic API key or OpenAI API key
- Optional: Ollama for local model support

## Citation

For testing and evaluation, we are using the following dataset:
For testing and evaluation, we are currently using the following dataset:

> Limam, M., et al. FATURA Dataset. Zenodo, 13 Dec. 2023, https://doi.org/10.5281/zenodo.10371464.

Expand Down