Skip to content

feat: add bounded per-agent memory summarization and recall#35

Open
XD319 wants to merge 4 commits intoceresOPA:developfrom
XD319:feat/issue-28-agent-memory
Open

feat: add bounded per-agent memory summarization and recall#35
XD319 wants to merge 4 commits intoceresOPA:developfrom
XD319:feat/issue-28-agent-memory

Conversation

@XD319
Copy link
Copy Markdown
Contributor

@XD319 XD319 commented Mar 21, 2026

Background

Issue #28 aims to control agent context growth by introducing per-agent memory summarization and bounded retrieval.

Before this change, Alicization Town had no persistent per-agent memory layer. Runtime events like say and interact disappeared after execution, so later tool calls could not recover relevant prior context without replaying full history.

Solution Overview

This PR adds a minimal end-to-end memory flow without introducing a new prompt framework or an LLM summarizer.

It includes:

  • Persistent per-agent memory storage in SQLite
  • Template-based short memory summaries generated from high-signal runtime events
  • Bounded retrieval by agent, partner, location, and recency
  • A lightweight authenticated recall API
  • Memory injection into MCP and CLI outputs where agents already consume context
  • Minimal skill documentation updates
  • Smoke/integration coverage for persistence, retrieval, and injected context

Data Model And Retrieval Rules

Storage

A new agent_memories table stores compact memory summaries with:

  • id
  • agent_id
  • partner_id
  • location
  • kind
  • content
  • metadata_json
  • created_at
  • updated_at
  • last_retrieved_at
  • retrieval_count

Indexes were added for:

  • agent_id + created_at
  • agent_id + partner_id + created_at
  • agent_id + location + created_at

Retrieval

Retrieval is intentionally simple and bounded.

Hard filters:

  • agent_id
  • optional since

Soft relevance scoring:

  • exact partner_id match: +2
  • exact location match: +1

Sort order:

  1. retrievalScore DESC
  2. created_at DESC
  3. id DESC

The API also updates last_retrieved_at and increments retrieval_count for returned memories.

Memory Generation

Memory summaries are generated from high-signal events only:

  • say
    • speaker gets a say memory
    • nearby listeners get heard memories
  • interact
    • actor gets an interaction memory
    • nearby same-zone witnesses get witnessed_interaction memories

Summaries are template-based, short, and truncated to avoid unbounded growth. No LLM summarizer is introduced.

Prompt Injection Path

There is no unified server-side prompt builder in the current codebase, so this PR uses the existing tool output surfaces instead of creating a new prompt system.

Chosen injection path:

  • server: lightweight POST /api/memories/recall
  • shared client: formatter support for appending a bounded Relevant memories section
  • MCP bridge:
    • tool handlers provide a small memoryContext
    • bridge appends recalled memories after tool text output
  • town CLI:
    • look, map, and interact append recalled memories after the main formatted output
  • skill docs:
    • clarify what Relevant memories means and how agents should use it

This keeps the change small and aligned with the existing architecture.

Test Results

Ran directly related tests:

  • node server/test/smoke.test.js
  • node packages/mcp-bridge/test/smoke.test.js
  • node packages/mcp-bridge/test/compat.test.js
  • node packages/town-cli/test/smoke.test.js
  • node packages/town-cli/test/look-format.test.js

Results:

  • server smoke: passed
  • MCP smoke: passed
  • MCP compat: passed
  • town CLI smoke: passed
  • look formatter test: passed

Note:

  • unrelated existing failures remain in packages/town-cli/test/storage.test.js
  • those failures are pre-existing cross-platform path expectation issues and are not introduced by this PR

Uncovered Risks

Still not fully covered by tests:

  • retrieval behavior under very large memory volume
  • deduplication or pruning beyond bounded recall
  • richer multi-entity relevance ranking
  • broader prompt/output regression coverage outside the tested look, map, and interact paths

Issue Link

Closes #28

@northseadl
Copy link
Copy Markdown
Collaborator

感谢这个 PR 🎉 为 Agent 引入记忆系统是正确的方向——template-based 摘要 + bounded retrieval + 软相关度打分,作为第一版记忆层整体思路清晰,代码质量和测试覆盖也不错。

我做了一轮代码审计,主要关注"Agent 长时间运行后记忆系统对上下文的影响",以下是一些发现和建议,供讨论。


🔴 Token 消耗累积

这是我最担心的问题。lookmapinteract 三个高频工具都触发 recall 注入,覆盖了 Agent 约 70% 的感知动作。每次注入 4 条记忆(~160 tokens),这些内容作为 tool output 进入 LLM conversation history 后不会被清除。

在长时间运行场景下:

  • 100 轮 tool call → 约 11,000 tokens 纯记忆累积
  • 由于没有去重机制(retrieval_count 已记录但未参与排序/过滤),在同一位置反复 look() 会返回高度重复的记忆,造成纯粹的 token 浪费

这对 Agent 的 context window 预算影响很大。

🔴 无遗忘 / 淘汰机制

agent_memories 表当前是只增不减的——整个代码路径中没有任何 DELETE 操作。一个活跃 Agent(200 say + 100 interact / hour,nearby=2)大约每小时写入 900 条记忆。

同时 recallMemories() 虽然签名中有 since 参数,但 6 处实际调用(MCP 3 处 + CLI 3 处)均未传入,意味着每次都从全量历史中检索。retrievalScore 是运行时计算列,无法走索引,随着数据量增长检索会线性变慢。

建议至少做两件事:(1) 给 recall 设一个默认时间窗口;(2) 实现 per-agent 上限淘汰。

🟡 记忆价值分层

当前 kind 字段(say/heard/interaction/witnessed_interaction)仅做事件分类,不区分价值。heardwitnessed_interaction 是 fan-out 被动记忆,对记忆主体通常是低信号噪音,但在检索时与主动行为记忆平等竞争 limit 配额。

如果在检索时给 kind 做差异化权重(如主动 > 被动),可以用很小的改动提升召回质量。

🟡 无合并 / 压缩能力

目前 1 事件 = 1 条记忆,没有合并机制。连续 3 次在面馆说话会产生 3 条独立记忆,而不是 "在面馆聊了一会儿"。长期运行后大量细粒度行为记忆会主导检索结果。

🟠 recall 逻辑在 MCP / CLI 重复实现

appendMemorySection 已经下沉到 shared/town-client,但 recall 触发逻辑在 MCP bridge 和 CLI 各自实现了一遍,参数构造方式也不完全一致。建议统一到 shared 层。


一点战略层面的想法

小镇未来有丰富游戏世界和引入资产系统的计划。到那个阶段,为 Agent 维护有效资产记忆(获得/失去了什么)以及关键经历摘要(与谁建立了什么关系、经历了什么重要事件),会是一个极有价值的能力——这也是记忆系统真正能体现差异化的地方。

但在当前阶段,游戏世界还没有足够丰富的高价值事件来填充记忆,当前写入的大部分是行为日志级别的低密度数据。如果现在就全量持久化这些数据并无差别地注入上下文,反而可能在 token 消耗和信噪比上产生负面影响。

所以建议可以考虑:记忆系统的基础架构(存储 + 检索 + 注入管线)保留,但在遗忘机制和 token 预算控制就绪之前,可以先保守地控制注入频率和范围,等游戏世界丰富化后再扩展记忆的覆盖面。


风险 严重度 建议
Token 累积消耗 🔴 加入去重或注入频率控制
无遗忘 / 淘汰 🔴 per-agent 上限 + 默认时间窗口
缺乏价值分层 🟡 检索时对 kind 差异化权重
无合并 / 压缩 🟡 可后续迭代
recall 逻辑重复 🟠 统一到 shared 层

@northseadl
Copy link
Copy Markdown
Collaborator

补充说明:以上评审意见是基于 Claude 的代码评估结果辅助分析得出的,如果有不准确的地方欢迎继续讨论。

记忆系统在未来是一个必须且极其有价值的能力方向,这一点毫无疑问。上面提的那些点(遗忘机制、token 预算、价值分层等)不是否定这个 PR 的方向,而是希望在合入前把长期运行的风险考虑清楚,让记忆系统从一开始就走在正确的路上。

XD319 added 2 commits March 22, 2026 10:28
# Conflicts:
#	packages/mcp-bridge/src/client.js
#	packages/mcp-bridge/src/index.js
#	packages/mcp-bridge/src/tools/communication.js
#	packages/mcp-bridge/src/tools/interaction.js
#	packages/mcp-bridge/test/smoke.test.js
#	packages/town-cli/src/lib/act.js
#	server/src/engine/world-engine.js
#	shared/town-client/formatters.js
#	shared/town-client/index.js
@XD319
Copy link
Copy Markdown
Contributor Author

XD319 commented Mar 22, 2026

感谢 @northseadl 的认真反馈,这些担忧是合理的,尤其是关于长期上下文膨胀、默认检索范围过宽,以及低价值 memory 被反复注入的问题。

这轮我没有继续扩系统能力,而是先收紧现有实现,让行为更保守、更 bounded。服务端这边补了几项约束:recall 现在即使调用方不传参数,也会使用默认时间窗口和更小的默认返回数量;对刚刚检索过的 memory 增加了短时间冷却,避免短时间内反复返回同样内容;同时加入了 per-agent 的固定容量上限,超过上限时会淘汰旧 memory,避免长期无限增长。另外在排序上也加了一层轻量的价值分层,在相关性接近时,主动产生的 memory 会优先于被动观察到的 memory。

MCP / CLI 这边也把自动注入的范围收紧了,而不是继续保持大面积默认注入。map 已经移除了自动 memory 注入,look 改成只有在比较窄的条件下才会注入,interact 仍然保留注入,但做了明确的 bounded 控制。同时我把 recall 参数构造和注入判定尽量收口到了 shared 层,减少了 MCP 和 CLI 两边各自维护一套逻辑的情况。

总体上,这一轮主要是直接回应“没有遗忘、没有默认时间窗、容易重复注入”这几个 问题,让 memory 机制先变得更克制、更可控。更大的优化,比如更细的 memory 价值建模、进一步的压缩/总结策略,我倾向于留到后续迭代里单独处理,避免这次 PR 一次性引入太多变化。

…ent-memory

# Conflicts:
#	packages/mcp-bridge/test/smoke.test.js
#	server/src/engine/world-engine.js
#	server/src/routes.js
@XD319
Copy link
Copy Markdown
Contributor Author

XD319 commented Mar 26, 2026

@ceresOPA 这个pr目前可以合并进develop分支吗?如果不行的话我就先关掉了

@ceresOPA
Copy link
Copy Markdown
Owner

@XD319 不好意思,我再评估下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants