Skip to content

Proposal: HIPPOCAMPUS Pre-Computed Concept Index for Retrieval Optimization #17

@globalcaos

Description

@globalcaos

Problem

Runtime vector search for memory retrieval is expensive and query-dependent. Every recall requires embedding the query, scanning the vector space, and ranking results — at inference time. As memory stores grow, this cost scales linearly.

Proposed Solution: Pre-Computed Concept Index

Instead of searching at inference time, pre-compute a concept-to-memory mapping during offline consolidation (the equivalent of "sleep"). At retrieval time, anchor words detected in the query map directly to pre-indexed memory clusters — O(1) dictionary lookup instead of runtime kNN.

How it works

  1. Build phase (offline, e.g. nightly consolidation):

    • Define an anchor vocabulary (concepts the agent frequently reasons about)
    • For each anchor, embed it and find the k-nearest memory chunks
    • Store the mapping: concept → [chunk_ids]
  2. Retrieval phase (inference time):

    • Detect anchor words in the current query/context
    • Look up pre-computed chunk lists — no embedding, no search
    • Fall through to traditional vector search only for novel/unseen concepts

Two-tier architecture

Tier Built Indexes Purpose
Episodic Real-time (on memory store) Raw events with temporal context Recent, unprocessed recall
Semantic Nightly (post-consolidation) Abstracted knowledge Stable, high-precision recall

Retrieval checks semantic tier first (higher precision), falls through to episodic (higher recency). The staleness gap in the semantic tier is intentional — you can't abstract an event before reflecting on it. This mirrors hippocampal replay during slow-wave sleep → cortical consolidation.

What we've built so far

Honest status: It's not yet wired into our live retrieval path. We still use runtime semantic search (Gemini embeddings via memory_search). The index exists and rebuilds nightly, but we haven't replaced the hot path with it yet. This is a design proposal, not a battle-tested system.

Research paper

We've written a paper exploring the neuroscience analogy and the math behind the approach:

The core insight from neuroscience: the human hippocampus doesn't store memories — it indexes them. Hippocampal damage prevents forming new memories not because storage fails, but because indexing breaks.

Why this could matter for Hexis

Hexis already has a rich memory taxonomy and Postgres + pgvector for retrieval. This concept index could sit as an optimization layer on top:

  • Pre-compute concept mappings during Hexis's consolidation/heartbeat cycles
  • Reduce inference-time vector search calls for frequently accessed memory types
  • Backend-agnostic — works with Postgres/pgvector just as well as SQLite

What we'd like

Your feedback on whether this approach has merit for Hexis's retrieval path. We're genuinely looking to learn, not to sell — if the idea doesn't fit, that's useful feedback too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions