perf: use Anthropic Message Batches API for eval extraction (50% cost reduction)

## Problem
LongMemEval-M extraction costs ~$1K+ (238K Haiku calls). Each call waits for a response synchronously, creating rate-limit bottlenecks.

## Solution
Use Anthropic's [Message Batches API](https://docs.anthropic.com/en/docs/build-with-claude/batch-processing) for extraction calls:
- **50% cost discount** on all batch-processed messages
- No rate limits — batch is processed asynchronously
- Results retrieved when ready (poll or webhook)

### Implementation
1. Collect all extraction prompts (session text → extraction request)
2. Submit as a single batch via `client.batches.create()`
3. Poll for completion
4. Parse results and feed into the normal pipeline

### Trade-offs
- Latency: batch processing takes longer per-call (up to 24h) but total throughput is higher
- Complexity: need to handle batch lifecycle (create → poll → retrieve)
- Could combine with `--types` filter for even more savings

### Expected savings
- $1K → ~$500 for full run
- $330 → ~$165 for targeted run (temporal + preference)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use Anthropic Message Batches API for eval extraction (50% cost reduction) #83

Problem

Solution

Implementation

Trade-offs

Expected savings

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

perf: use Anthropic Message Batches API for eval extraction (50% cost reduction) #83

Description

Problem

Solution

Implementation

Trade-offs

Expected savings

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions