Problem
LongMemEval-M extraction costs ~$1K+ (238K Haiku calls). Each call waits for a response synchronously, creating rate-limit bottlenecks.
Solution
Use Anthropic's Message Batches API for extraction calls:
- 50% cost discount on all batch-processed messages
- No rate limits — batch is processed asynchronously
- Results retrieved when ready (poll or webhook)
Implementation
- Collect all extraction prompts (session text → extraction request)
- Submit as a single batch via
client.batches.create()
- Poll for completion
- Parse results and feed into the normal pipeline
Trade-offs
- Latency: batch processing takes longer per-call (up to 24h) but total throughput is higher
- Complexity: need to handle batch lifecycle (create → poll → retrieve)
- Could combine with
--types filter for even more savings
Expected savings
- $1K → ~$500 for full run
- $330 → ~$165 for targeted run (temporal + preference)
Problem
LongMemEval-M extraction costs ~$1K+ (238K Haiku calls). Each call waits for a response synchronously, creating rate-limit bottlenecks.
Solution
Use Anthropic's Message Batches API for extraction calls:
Implementation
client.batches.create()Trade-offs
--typesfilter for even more savingsExpected savings