Skip to content

0xstackforge/LLM-Email-Attachment-Evaluator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Email Attachment Intelligence

A production-ready document intelligence pipeline that classifies email attachments from .eml files as relevant or irrelevant, using only the emailโ€™s HTML body context.

This system leverages the Anthropic Claude API for contextual reasoning and includes an evaluation module for performance benchmarking against ground truth data.


๐Ÿš€ Overview

The project processes .eml email files and performs the following:

  1. Extracts:

    • HTML body
    • Attachment filenames
  2. Classifies each attachment into exactly one category:

    • relevant
    • irrelevant
  3. Generates structured JSON output files.

  4. Evaluates predictions against labeled ground truth using standard classification metrics.


๐Ÿง  Classification Rules

Classification must rely exclusively on the emailโ€™s HTML body.

The following information must not be used:

  • Attachment contents
  • MIME types
  • Filenames
  • Headers
  • Any metadata

This constraint simulates real-world scenarios where reasoning must be based solely on rendered email content.


๐Ÿ“‚ Project Structure

doczen/
โ”‚
โ”œโ”€โ”€ examples/
โ”‚   โ”œโ”€โ”€ example_00001.eml
โ”‚   โ”œโ”€โ”€ example_00002.eml
โ”‚   โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ ground_truth/
โ”‚   โ”œโ”€โ”€ attachments_00001.json
โ”‚   โ”œโ”€โ”€ attachments_00002.json
โ”‚   โ””โ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ output/
โ”‚
โ”œโ”€โ”€ classify_attachments.py
โ”œโ”€โ”€ evaluate.py
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

โš™๏ธ Installation

1. Clone the Repository

git clone https://github.com/your-org/doczen.git
cd doczen

2. Create a Virtual Environment

python -m venv venv
source venv/bin/activate  # macOS/Linux
venv\Scripts\activate     # Windows

3. Install Dependencies

pip install -r requirements.txt

Example requirements.txt:

anthropic
beautifulsoup4
tqdm
scikit-learn

๐Ÿ” Environment Configuration

Set your Anthropic API key:

macOS/Linux

export ANTHROPIC_API_KEY=your_key_here

Windows

set ANTHROPIC_API_KEY=your_key_here

๐Ÿ“Œ Component 1: Attachment Classification

Purpose

Reads .eml files from examples/, extracts HTML content and attachment filenames, and classifies attachments using Claude.

Run

python classify_attachments.py

Output

Generated files:

output/
  attachments_00001.json
  attachments_00002.json
  ...

Example output:

{
  "relevant": [
    "example_00001_attachment_02.pdf"
  ],
  "irrelevant": [
    "example_00001_attachment_01.jpg"
  ]
}

Each attachment must appear in exactly one category.


๐Ÿงฉ Prompt Strategy

The model receives:

  • Full HTML body
  • List of attachment filenames

It is instructed to:

  • Identify attachments materially referenced in the email
  • Detect decorative or structural HTML elements (logos, icons, signature images)
  • Return strictly structured JSON output
  • Avoid explanations

๐Ÿ“Š Component 2: Evaluation

Purpose

Compares generated outputs against ground truth labels.

Run

python evaluate.py

Metrics Computed

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • Per-file breakdown
  • Macro-averaged summary

Each attachment is treated as a binary classification:

  • Positive โ†’ relevant
  • Negative โ†’ irrelevant

๐Ÿ“ˆ Evaluation Methodology

Ground truth files must match output naming format:

ground_truth/attachments_00001.json

Evaluation compares attachment-level predictions against reference labels.


๐Ÿ—๏ธ Design Principles

Deterministic Output

Strict JSON formatting enables automated validation and evaluation.

Separation of Concerns

  • classify_attachments.py handles inference
  • evaluate.py handles benchmarking

Reproducibility

Consistent file naming and structured outputs ensure experiment tracking.


๐Ÿ›ก๏ธ Error Handling & Validation

The classification pipeline includes:

  • API retry handling
  • JSON schema validation
  • Attachment coverage verification
  • Logging for malformed responses

๐Ÿ”„ Example Workflow

# Step 1: Generate classifications
python classify_attachments.py

# Step 2: Evaluate performance
python evaluate.py

๐Ÿš€ Production Considerations

  • Rate limiting and exponential backoff
  • Deterministic JSON validation
  • Cost monitoring for API usage
  • Parallel processing support
  • Prompt versioning
  • CI-based regression evaluation

๐Ÿ”ฎ Extensibility

This pipeline can be extended to support:

  • Confidence scoring
  • Multi-class categorization
  • Prompt optimization experiments
  • Async API batching
  • Docker deployment
  • Model comparison benchmarking

๐Ÿ“œ License

MIT License

Copyright (c) 2026 Will

About

LLM-powered pipeline for parsing .eml files, extracting HTML bodies, classifying email attachments as relevant or irrelevant based solely on HTML context, and evaluating results against ground truth using automated metrics.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages