Skip to content

feat(memory): add SQLite-backed persistent session store#719

Closed
is-Xiaoen wants to merge 4 commits intosipeed:mainfrom
is-Xiaoen:feat/sqlite-memory-store
Closed

feat(memory): add SQLite-backed persistent session store#719
is-Xiaoen wants to merge 4 commits intosipeed:mainfrom
is-Xiaoen:feat/sqlite-memory-store

Conversation

@is-Xiaoen
Copy link
Contributor

@is-Xiaoen is-Xiaoen commented Feb 24, 2026

📝 Description

Add a new pkg/memory/ package that implements a SQLite-backed session store using modernc.org/sqlite (pure Go, zero CGo, cross-compile friendly).

Problem: The current pkg/session/manager.go uses an in-memory map with full JSON serialization on every Save() call. This leads to:

Solution: A new Store interface whose methods are each an atomic SQLite transaction — no separate Save() needed. This eliminates the storage-level write conflicts entirely.

This PR is purely additive — it does not modify any existing code. Integration with pkg/agent/loop.go will be a follow-up PR.

Key design decisions

  • modernc.org/sqlite: pure Go driver, zero CGo dependency, ideal for cross-compilation to embedded targets
  • Single connection (SetMaxOpenConns(1)): serializes all writes at the DB level, preventing corrupt interleaving
  • WAL journal mode: readers never block writers, good concurrency on read-heavy workloads
  • ToolCalls as JSON column: always read/written together with their message, no need for a separate table
  • Tuned for embedded: 512KB cache, synchronous=NORMAL, 5s busy timeout

Migration

MigrateFromJSON() imports existing sessions/*.json files into SQLite:

  • Reads session key from JSON content (not filename), correctly handling sanitizeFilename() colon → underscore mapping
  • Renames migrated files to .json.migrated as backup (non-destructive)
  • Idempotent: safe to re-run after partial failures via INSERT OR IGNORE

Files

pkg/memory/
├── store.go            // Store interface (8 methods, maps 1:1 to current SessionManager)
├── sqlite.go           // SQLiteStore implementation
├── sqlite_test.go      // 14 unit + 2 concurrency + 3 benchmark tests
├── migration.go        // JSON → SQLite migration
└── migration_test.go   // 7 migration tests

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

Ref #704 — eliminates the storage-level race condition (concurrent Save() calls overwriting each other). Full application-level fix (summarize goroutine vs main loop) to follow in integration PR.

📚 Technical Context (Skip for Docs)

  • Reference URL: 26M2W3 community meeting notes (Agent track: "introduce SQLite")
  • Reasoning: The current JSON file approach fundamentally cannot support concurrent writes safely. SQLite with WAL provides ACID transactions with minimal overhead, and modernc.org/sqlite avoids CGo for embedded cross-compilation.

🧪 Test Environment

  • Hardware: Intel i7-12650H (development), targeting Raspberry Pi / MaixCAM
  • OS: Windows 11 (development), tests are platform-independent
  • Model/Provider: N/A (storage layer, no LLM interaction)
  • Channels: N/A

📸 Evidence (Optional)

Click to view test results and benchmarks
=== RUN   TestOpen_CreatesDatabase          --- PASS
=== RUN   TestOpen_ExistingDatabase         --- PASS
=== RUN   TestAddMessage_BasicRoundtrip     --- PASS
=== RUN   TestAddMessage_AutoCreatesSession --- PASS
=== RUN   TestAddFullMessage_WithToolCalls  --- PASS
=== RUN   TestAddFullMessage_ToolCallID     --- PASS
=== RUN   TestGetHistory_EmptySession       --- PASS
=== RUN   TestGetHistory_Ordering           --- PASS
=== RUN   TestSetSummary_GetSummary         --- PASS
=== RUN   TestTruncateHistory_KeepLast      --- PASS
=== RUN   TestTruncateHistory_KeepZero      --- PASS
=== RUN   TestSetHistory_ReplacesAll        --- PASS
=== RUN   TestConcurrent_AddAndRead         --- PASS
=== RUN   TestConcurrent_SummarizeRace      --- PASS
=== RUN   TestMigrateFromJSON_Basic         --- PASS
=== RUN   TestMigrateFromJSON_WithToolCalls --- PASS
=== RUN   TestMigrateFromJSON_MultipleFiles --- PASS
=== RUN   TestMigrateFromJSON_InvalidJSON   --- PASS
=== RUN   TestMigrateFromJSON_RenamesFiles  --- PASS
=== RUN   TestMigrateFromJSON_Idempotent    --- PASS
=== RUN   TestMigrateFromJSON_ColonInKey    --- PASS
ok      github.com/sipeed/picoclaw/pkg/memory   0.990s

BenchmarkAddMessage-16          12854     91923 ns/op    2880 B/op     81 allocs/op
BenchmarkGetHistory_100-16      12516     98754 ns/op   43640 B/op   1226 allocs/op
BenchmarkGetHistory_1000-16      1375    852323 ns/op  390874 B/op  12029 allocs/op

go vet ./pkg/memory/... — clean, no issues.

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

Introduce pkg/memory with a Store interface that maps 1:1 to the
current SessionManager API. Each method is an atomic operation,
eliminating the need for a separate Save() call.

This lays the groundwork for replacing JSON file-based session
storage with a transactional backend.
Add SQLiteStore using modernc.org/sqlite (pure Go, zero CGo) with:

- WAL journal mode for concurrent read/write on embedded devices
- Single-connection serialization to prevent write conflicts
- Transactional message insert with auto-incrementing sequence
- ToolCalls stored as JSON column (always read/written with message)
- Auto-creation of session rows on first message

Connection tuned for embedded use: 512KB cache, NORMAL synchronous
mode, 5s busy timeout, foreign keys enabled for cascade deletes.
Cover all Store methods with 14 unit tests including:
- Basic CRUD roundtrips for messages and summaries
- Auto-creation of sessions on first write
- ToolCall JSON serialization/deserialization fidelity
- Message ordering guarantees
- TruncateHistory edge cases (keep-last, keep-zero)
- SetHistory full replacement

Add 2 concurrency tests:
- 10 goroutines writing + 10 reading simultaneously
- Simulated summarize-vs-main-loop race (ref sipeed#704)

Add 3 benchmarks for write throughput and read performance
at 100/1000 message scale.
Add MigrateFromJSON() to import existing sessions/*.json files into
SQLiteStore for seamless upgrade from the current file-based backend.

Key design decisions:
- Read session key from JSON content, not filename (handles
  sanitizeFilename colon-to-underscore mapping correctly)
- Rename migrated files to .json.migrated as backup (non-destructive)
- INSERT OR IGNORE for idempotent partial-migration recovery
- Skip invalid JSON files without aborting other migrations

Includes 7 tests covering basic migration, tool call preservation,
batch migration, error resilience, rename verification, idempotency,
and the colon-in-key edge case.
@is-Xiaoen is-Xiaoen force-pushed the feat/sqlite-memory-store branch from 61263d6 to 450d37f Compare February 24, 2026 10:22
@yinwm
Copy link
Collaborator

yinwm commented Feb 24, 2026

@is-Xiaoen

Overall Assessment

Thank you for this high-quality PR! The code is well-written with comprehensive test coverage and clear documentation. However, I have some questions about the technical direction that I'd like to discuss.


Core Question: Do we really need SQLite?

The current JSON storage issue stems from Save() using RLock instead of Lock, which causes race conditions during concurrent writes. This problem can be solved with much simpler approaches.


Why do mainstream tools use plain text storage?

Tool Storage Format
Git Plain text files
VS Code JSON
Docker JSON/overlayfs
Claude Code JSONL
Most CLI tools YAML/JSON/TOML

Claude Code's storage approach:

~/.claude/
├── history.jsonl          # ← 6904+ lines, one JSON object per line
├── file-history/          # ← File version history
├── config.json            # ← Configuration
└── ...

Even a complex AI tool like Claude Code uses JSONL instead of SQLite for storing history.

Advantages of plain text:

  • ✅ Human-readable, debuggable
  • ✅ Direct manipulation with grep/jq/vim
  • ✅ No additional dependencies
  • ✅ Easy to migrate

Alternative: JSONL

Instead of SQLite, I'd suggest considering JSONL (JSON Lines) format:

sessions/
├── telegram_123456.jsonl      # One message per line
└── telegram_123456.meta.json  # summary, updated_at

Comparison:

Aspect JSONL SQLite
Readability tail/grep/jq ❌ Requires specialized tools
Dependencies ✅ None ❌ 1 direct + 6 indirect deps
Incremental writes ✅ Append ✅ INSERT
Concurrency safety ⚠️ Needs file lock ✅ Transactions
Debug-friendly ✅ Direct inspection ❌ Requires SQL
Mainstream adoption ✅ Used by Claude Code ❌ Niche

JSONL key advantages:

  • Append each message to end of file, no need to rewrite entire file
  • Retains JSON's readability advantages
  • No additional dependencies
  • Validated by mature tools like Claude Code

Recommendation

I'd prefer trying simpler approaches first:

  1. Short-term: Fix Save() lock issue (RLock → Lock) + file locking
  2. Medium-term: If incremental writes are needed, migrate to JSONL
  3. Long-term: Only consider SQLite when truly needing multi-process sharing or complex queries

This project currently runs as a single process. SQLite's transaction benefits may not be utilized, while adding complexity and dependencies.


Summary

The PR code quality is excellent, but I have reservations about the technical direction. I'd suggest discussing:

  1. Do we really need SQLite's complexity?
  2. Would JSONL + file locking better align with KISS principles?
  3. Why do similar tools like Claude Code choose JSONL?

Looking forward to the discussion!

@is-Xiaoen
Copy link
Contributor Author

Thanks for the thorough review! The dependency concern is totally fair — 7 transitive deps for session storage is a real cost for a "pico" tool.

One technical nuance I want to flag: JSONL's append-only advantage breaks down for TruncateHistory, which runs every ~20 messages during summarization. Truncating in JSONL means read-filter-rewrite the entire file — the exact same full-serialization pattern we're trying to move away from. SetHistory (used by forceCompression) has the same issue. Claude Code's history.jsonl works because it's a pure append-only audit log; PicoClaw's sessions need truncation and replacement on a regular basis, which are fundamentally different access patterns.

That said, the Store interface is deliberately backend-agnostic. Adding a JSONL-backed implementation behind the same interface would be straightforward. The real value of this PR is arguably the interface design (atomic operations, no separate Save()) rather than the specific backend.

The 26M2W3 meeting notes mentioned "introduce SQLite" for the Agent track, which is why I went this direction. But I understand that may be a premature jump for the current single-process setup.

Happy to implement a JSONL backend alongside or instead of SQLite — what direction would you prefer?

@is-Xiaoen
Copy link
Contributor Author

Superseded by #732 (JSONL approach per @yinwm's feedback). Closing this one to keep things clean.

@is-Xiaoen is-Xiaoen closed this Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants