A collection of example scripts for extracting structured invoice data from PDFs using various APIs and services.
- Extract text from invoice PDFs with pdfplumber
- Parse invoice contents using:
- OpenAI GPT-4 (
openai-main.py
) - Anthropic Claude (
anthropic-main.py
) - Invofox API (
invofox-main.py
)
- OpenAI GPT-4 (
- Includes sample invoice files (
invoice_sample.pdf
,invoice_sample.jpg
)
- Python 3.8 or higher
- A virtual environment (Recommended)
- API keys for:
- OpenAI (
OPENAI_API_KEY
) - Anthropic (
ANTHROPIC_API_KEY
) - Invofox (
INVOFOX_API_KEY
)
- OpenAI (
-
Clone the repository:
git clone https://github.com/Anmol-Baranwal/doc-parsing.git cd doc-parsing
-
Create and activate a virtual environment:
python -m venv env source env/bin/activate # macOS/Linux env\Scripts\activate # Windows
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the project root and add your API keys:OPENAI_API_KEY=your-openai-api-key ANTHROPIC_API_KEY=your-anthropic-api-key INVOFOX_API_KEY=your-invofox-api-key
-
OpenAI GPT-4:
python openai-main.py
-
Anthropic Claude:
python anthropic-main.py
-
Invofox API:
python invofox-main.py
Thanks for visiting!