chore: benchmark aletheia memory against LongMemEval and LoCoMo

## Summary

The agent memory field has converged on two standard benchmarks:
- **LongMemEval**: 500 manual questions, 5 memory abilities, ~115K token histories. SOTA: Hindsight 91.4%
- **LoCoMo**: 10 long-term conversations, ~200 questions each, ~27 sessions, 588 turns. SOTA: Hindsight 89.61%

Aletheia's memory architecture is architecturally ahead of the field, but without benchmark numbers, this is a design claim rather than a demonstrated result.

## Also relevant

- **HaluMem** (arxiv 2511.03506): First benchmark for memory hallucination
- **ActMemEval**: Logic-driven causal reasoning over memory

Aletheia's eval crate (dokimion) could adapt these into scenarios.

## Source

LongMemEval, LoCoMo, HaluMem (arxiv 2511.03506).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: benchmark aletheia memory against LongMemEval and LoCoMo #2854

Summary

Also relevant

Source

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

chore: benchmark aletheia memory against LongMemEval and LoCoMo #2854

Description

Summary

Also relevant

Source

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions