feat: add per-user memory with LLM extraction, dedup, and decay#67
feat: add per-user memory with LLM extraction, dedup, and decay#67
Conversation
Koopa0
commented
Feb 17, 2026
- PostgreSQL + pgvector native memory store (two-threshold dedup, time-decay)
- LLM-powered fact extraction with nonce-delimited prompt injection defense
- LLM conflict arbitration for near-duplicate memories
- PostgreSQL + pgvector native memory store (two-threshold dedup,
time-decay)
- LLM-powered fact extraction with nonce-delimited prompt injection
defense
- LLM conflict arbitration for near-duplicate memories
| DROP TABLE IF EXISTS memories; | ||
| DROP TABLE IF EXISTS messages; | ||
|
|
||
| -- ============================================================================ | ||
| -- Drop Sessions Table (including indexes) | ||
| -- ============================================================================ | ||
|
|
||
| DROP INDEX IF EXISTS idx_sessions_owner_id; | ||
| DROP INDEX IF EXISTS idx_sessions_updated_at; | ||
| DROP TABLE IF EXISTS sessions; | ||
|
|
||
| -- ============================================================================ | ||
| -- Drop Documents Table (including indexes) | ||
| -- ============================================================================ | ||
|
|
||
| DROP INDEX IF EXISTS idx_documents_owner; | ||
| DROP INDEX IF EXISTS idx_documents_metadata_gin; | ||
| DROP INDEX IF EXISTS idx_documents_source_type; | ||
| DROP INDEX IF EXISTS idx_documents_embedding; | ||
| DROP TABLE IF EXISTS documents; | ||
|
|
||
| -- ============================================================================ | ||
| -- Drop Extensions | ||
| -- Note: Only drop if no other schemas depend on it | ||
| -- ============================================================================ | ||
|
|
||
| DROP EXTENSION IF EXISTS vector; |
There was a problem hiding this comment.
Indexes are dropped after their associated tables (e.g., indexes on 'sessions' and 'documents' are dropped after the tables themselves). This can cause errors if the table no longer exists when attempting to drop the index. Recommendation: Drop all indexes before dropping their associated tables to avoid dependency errors.
| DROP TABLE IF EXISTS memories; | ||
| DROP TABLE IF EXISTS messages; | ||
|
|
||
| -- ============================================================================ | ||
| -- Drop Sessions Table (including indexes) | ||
| -- ============================================================================ | ||
|
|
||
| DROP INDEX IF EXISTS idx_sessions_owner_id; | ||
| DROP INDEX IF EXISTS idx_sessions_updated_at; | ||
| DROP TABLE IF EXISTS sessions; | ||
|
|
||
| -- ============================================================================ | ||
| -- Drop Documents Table (including indexes) | ||
| -- ============================================================================ | ||
|
|
||
| DROP INDEX IF EXISTS idx_documents_owner; | ||
| DROP INDEX IF EXISTS idx_documents_metadata_gin; | ||
| DROP INDEX IF EXISTS idx_documents_source_type; | ||
| DROP INDEX IF EXISTS idx_documents_embedding; | ||
| DROP TABLE IF EXISTS documents; | ||
|
|
||
| -- ============================================================================ | ||
| -- Drop Extensions | ||
| -- Note: Only drop if no other schemas depend on it | ||
| -- ============================================================================ | ||
|
|
||
| DROP EXTENSION IF EXISTS vector; |
There was a problem hiding this comment.
The migration script does not wrap the DROP statements in a transaction. If an error occurs partway through, the database could be left in an inconsistent state. Recommendation: Wrap all statements in a transaction block (e.g., BEGIN; ... COMMIT;) to ensure atomicity.
| CREATE TABLE IF NOT EXISTS sessions ( | ||
| id UUID PRIMARY KEY DEFAULT gen_random_uuid(), | ||
| title TEXT, | ||
| owner_id TEXT NOT NULL DEFAULT '', |
There was a problem hiding this comment.
Potential Data Integrity and Security Issue:
The owner_id column in the sessions table is defined as TEXT NOT NULL DEFAULT ''. This allows sessions to be created with an empty string as the owner, which may lead to orphaned sessions and complicate access control. It is recommended to:
- Remove the default value and require explicit assignment of a valid owner.
- Alternatively, use
NULLto indicate no owner and enforce ownership at the application level.
Example:
owner_id TEXT NOT NULLOr, if nullable:
owner_id TEXT| CREATE UNIQUE INDEX idx_memories_owner_content_unique | ||
| ON memories(owner_id, md5(content)) WHERE active = true; |
There was a problem hiding this comment.
Deduplication Vulnerability via MD5 Hash Collisions:
The unique index on md5(content) for active memories may allow hash collisions, potentially permitting duplicate content. Additionally, deduplication does not apply to inactive memories, which could result in inconsistent data. Consider using a stronger hash function (e.g., SHA256) or a direct unique constraint on the content column if performance and storage allow.
Example (if using SHA256):
ON memories(owner_id, encode(digest(content, 'sha256'), 'hex')) WHERE active = true;| // | ||
| // Shutdown order: | ||
| // 1. Cancel context (signals background tasks to stop) | ||
| // 2. Close DB pool | ||
| // 3. Flush OTel spans | ||
| // 2. Wait for background goroutines (scheduler) to exit | ||
| // 3. Close DB pool | ||
| // 4. Flush OTel spans | ||
| func (a *App) Close() error { | ||
| a.closeOnce.Do(func() { | ||
| slog.Info("shutting down application") |
There was a problem hiding this comment.
Resource cleanup error handling is missing in Close().
The Close method does not capture or propagate errors from dbCleanup or otelCleanup. If these cleanup functions fail, the error will be silently ignored, potentially masking critical shutdown failures. Consider aggregating errors from each cleanup step and returning a combined error:
var err error
if a.dbCleanup != nil {
if cerr := a.dbCleanup(); cerr != nil {
err = errors.Join(err, cerr)
}
}
// ... same for otelCleanup
return err| func FuzzContainsSecrets(f *testing.F) { | ||
| f.Add("hello world") | ||
| f.Add("sk-1234567890abcdefghijklmnop") | ||
| f.Add("") | ||
| f.Add("password=secret123456") | ||
| f.Fuzz(func(_ *testing.T, input string) { | ||
| ContainsSecrets(input) // must not panic | ||
| }) | ||
| } |
There was a problem hiding this comment.
Fuzz test does not validate correctness of output
The fuzz test for ContainsSecrets (lines 103-111) only checks that the function does not panic, but does not assert the correctness of its output. This could allow silent failures if the function starts returning incorrect results without panicking. To improve robustness, consider adding assertions for expected output on known inputs, or at least logging unexpected results for further analysis.
Example improvement:
f.Fuzz(func(t *testing.T, input string) {
got := ContainsSecrets(input)
// Optionally assert or log output for known cases
// t.Logf("ContainsSecrets(%q) = %v", input, got)
})| if n, err := s.store.UpdateDecayScores(ctx); err != nil { | ||
| s.logger.Warn("decay update failed", "error", err) | ||
| } else if n > 0 { | ||
| s.logger.Debug("decay scores updated", "count", n) | ||
| } |
There was a problem hiding this comment.
Error Handling Limitation:
If UpdateDecayScores fails, the error is only logged and not propagated or retried. This could result in persistent data inconsistencies if failures are not transient. Consider implementing a retry mechanism or escalating the error to ensure critical failures are not silently ignored.
Example:
for retries := 0; retries < maxRetries; retries++ {
n, err := s.store.UpdateDecayScores(ctx)
if err == nil {
if n > 0 {
s.logger.Debug("decay scores updated", "count", n)
}
break
}
s.logger.Warn("decay update failed", "error", err)
time.Sleep(retryDelay)
}| if n, err := s.store.DeleteStale(ctx); err != nil { | ||
| s.logger.Warn("stale expiry failed", "error", err) | ||
| } else if n > 0 { | ||
| s.logger.Info("expired stale memories", "count", n) | ||
| } |
There was a problem hiding this comment.
Error Handling Limitation:
Similar to the previous block, if DeleteStale fails, the error is only logged. This could lead to stale data persisting indefinitely. Consider adding a retry mechanism or alerting to ensure failures are addressed promptly.
Example:
for retries := 0; retries < maxRetries; retries++ {
n, err := s.store.DeleteStale(ctx)
if err == nil {
if n > 0 {
s.logger.Info("expired stale memories", "count", n)
}
break
}
s.logger.Warn("stale expiry failed", "error", err)
time.Sleep(retryDelay)
}| defer func() { | ||
| if rbErr := tx.Rollback(ctx); rbErr != nil && !errors.Is(rbErr, pgx.ErrTxClosed) { | ||
| s.logger.Debug("transaction rollback", "error", rbErr) | ||
| } | ||
| }() |
There was a problem hiding this comment.
Deferred Rollback Always Executes After Commit
The deferred rollback in the Add method will always execute, even after a successful commit. This can result in unnecessary debug logs and may mask real rollback errors. To improve reliability and clarity, track whether the transaction has been committed and only attempt rollback if it has not:
committed := false
// ...
if err := tx.Commit(ctx); err != nil {
return fmt.Errorf("committing memory transaction: %w", err)
}
committed = true
// ...
defer func() {
if !committed {
if rbErr := tx.Rollback(ctx); rbErr != nil && !errors.Is(rbErr, pgx.ErrTxClosed) {
s.logger.Debug("transaction rollback", "error", rbErr)
}
}
}()| func (s *Store) evictIfNeeded(ctx context.Context, ownerID string) error { | ||
| var count int | ||
| if err := s.pool.QueryRow(ctx, | ||
| `SELECT COUNT(*) FROM memories WHERE owner_id = $1 AND active = true`, | ||
| ownerID, | ||
| ).Scan(&count); err != nil { | ||
| return fmt.Errorf("counting memories: %w", err) | ||
| } | ||
|
|
||
| if count <= MaxPerUser { | ||
| return nil | ||
| } | ||
|
|
||
| excess := count - MaxPerUser | ||
|
|
||
| // First try to evict inactive memories. | ||
| tag, err := s.pool.Exec(ctx, | ||
| `DELETE FROM memories | ||
| WHERE id IN ( | ||
| SELECT id FROM memories | ||
| WHERE owner_id = $1 AND active = false | ||
| ORDER BY updated_at ASC, id ASC | ||
| LIMIT $2 | ||
| )`, | ||
| ownerID, excess, | ||
| ) | ||
| if err != nil { | ||
| return fmt.Errorf("evicting inactive: %w", err) | ||
| } | ||
|
|
||
| remaining := excess - int(tag.RowsAffected()) | ||
| if remaining <= 0 { | ||
| return nil | ||
| } | ||
|
|
||
| // Evict oldest active by created_at. | ||
| _, err = s.pool.Exec(ctx, | ||
| `DELETE FROM memories | ||
| WHERE id IN ( | ||
| SELECT id FROM memories | ||
| WHERE owner_id = $1 AND active = true | ||
| ORDER BY created_at ASC, id ASC | ||
| LIMIT $2 | ||
| )`, | ||
| ownerID, remaining, | ||
| ) | ||
| if err != nil { | ||
| return fmt.Errorf("evicting oldest active: %w", err) | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Eviction Failure Handling May Allow Resource Exhaustion
In evictIfNeeded, eviction failures are only logged as warnings and not propagated. If eviction is critical for enforcing memory limits, this could allow users to exceed MaxPerUser, potentially leading to resource exhaustion. Consider whether eviction failures should be handled more strictly, such as retrying or propagating the error to the caller, depending on the application's requirements.