Skip to content

perf: use Anthropic Message Batches API for eval extraction (50% cost reduction) #83

@hdviettt

Description

@hdviettt

Problem

LongMemEval-M extraction costs ~$1K+ (238K Haiku calls). Each call waits for a response synchronously, creating rate-limit bottlenecks.

Solution

Use Anthropic's Message Batches API for extraction calls:

  • 50% cost discount on all batch-processed messages
  • No rate limits — batch is processed asynchronously
  • Results retrieved when ready (poll or webhook)

Implementation

  1. Collect all extraction prompts (session text → extraction request)
  2. Submit as a single batch via client.batches.create()
  3. Poll for completion
  4. Parse results and feed into the normal pipeline

Trade-offs

  • Latency: batch processing takes longer per-call (up to 24h) but total throughput is higher
  • Complexity: need to handle batch lifecycle (create → poll → retrieve)
  • Could combine with --types filter for even more savings

Expected savings

  • $1K → ~$500 for full run
  • $330 → ~$165 for targeted run (temporal + preference)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions