Skip to content

feat(media): integrate TTL cleanup into FileMediaStore#728

Open
ex-takashima wants to merge 5 commits intosipeed:refactor/channel-systemfrom
ex-takashima:feat/media-cleanup-ttl
Open

feat(media): integrate TTL cleanup into FileMediaStore#728
ex-takashima wants to merge 5 commits intosipeed:refactor/channel-systemfrom
ex-takashima:feat/media-cleanup-ttl

Conversation

@ex-takashima
Copy link
Contributor

Summary

  • Integrate TTL-based background cleanup (L2 safety net) directly into FileMediaStore, so file deletion and in-memory ref removal happen atomically under the same mutex — no dangling references
  • Add MediaCleanupConfig (enabled, max_age_minutes, interval_minutes) to tools.media_cleanup config section with sensible defaults (enabled, 30min max age, 5min interval)
  • Wire up NewFileMediaStoreWithCleanup + Start()/Stop() in cmd_gateway.go

Design decisions

  • L1/L2 hybrid: ReleaseAll(scope) remains the primary (L1) cleanup path for scope-based immediate deletion. CleanExpired() acts as L2 safety net for leaked entries
  • Atomic cleanup: Both file removal and in-memory ref/scope cleanup happen under a single sync.RWMutex lock, preventing races between cleanup and resolve
  • refToScope reverse map: Enables CleanExpired() to find and clean scope mappings without scanning all scopes
  • nowFunc injection: Allows deterministic time control in tests without time.Sleep

Files changed

File Change
pkg/media/store.go storedAt, refToScope, MediaCleanerConfig, CleanExpired(), Start()/Stop()
pkg/media/store_test.go 8 new tests (expiry, mixed ages, empty scopes, lifecycle, concurrency, consistency)
pkg/config/config.go MediaCleanupConfig struct + ToolsConfig.MediaCleanup field
pkg/config/defaults.go Default: {Enabled: true, MaxAge: 30, Interval: 5}
cmd/picoclaw/cmd_gateway.go Wiring: NewFileMediaStoreWithCleanup + Start()/Stop()

Test plan

  • go test ./pkg/media/ -v — 15/15 PASS (7 existing + 8 new)
  • go test ./pkg/config/ -v — all PASS
  • go test ./pkg/media/ -race -v — needs cgo-enabled environment (CI)
  • go build ./cmd/picoclaw/ — pre-existing cmd_onboard.go embed error on this branch

🤖 Generated with Claude Code

Add background TTL-based cleanup (L2 safety net) directly into
FileMediaStore so file deletion and in-memory ref removal happen
atomically under the same mutex, preventing dangling references.

- Add storedAt timestamp and refToScope reverse map to mediaEntry
- Add CleanExpired() for atomic TTL-based expiration
- Add Start()/Stop() for background goroutine lifecycle
- Add MediaCleanupConfig (enabled, max_age, interval) to config
- Wire up in cmd_gateway.go with config-driven defaults
- Add 8 new tests including concurrent cleanup safety

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
}

// Start begins the background cleanup goroutine if cleanup is enabled.
func (s *FileMediaStore) Start() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start function need match the Stop function also using s.once.Do to dispatch goroutine.

mediaStore.Start()

channelManager, err := channels.NewManager(cfg, msgBus, mediaStore)
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there err path not close the goroutine?

removed := 0

for ref, entry := range s.refs {
if entry.storedAt.Before(cutoff) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and should cutoff >= 0 ?

ex-takashima and others added 3 commits February 26, 2026 16:22
- CleanExpired: split into two phases — collect expired entries under
  lock, then delete files after releasing the lock to minimize contention
- CleanExpired: guard against zero MaxAge (no-op if unconfigured)
- CleanExpired: log file removal errors instead of silently ignoring
- Start: protect with startOnce to prevent multiple goroutines
- Stop: rename once -> stopOnce for clarity
- cmd_gateway: call mediaStore.Stop() on error path after Start()
- Add TestCleanExpiredZeroMaxAge and double-Start test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address Codex (GPT-5.2) review feedback:
- Start: guard against Interval<=0 or MaxAge<=0 to prevent
  time.NewTicker panic on misconfiguration
- ReleaseAll: split into two phases (collect under lock, delete
  after unlock) matching CleanExpired pattern
- ReleaseAll: log file removal errors
- Add TestStartZeroIntervalNoPanic and TestStartZeroMaxAgeNoPanic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement TypingCapable interface for LINE channel using the
loading animation API (1:1 chats only, no group support).

- Add StartTyping() with 50s periodic refresh and context-based stop
- Integrate PlaceholderRecorder.RecordTypingStop in processEvent
- Skip RecordPlaceholder (LINE has no message edit API)
- Change sendLoading to accept context and return error
- Relax callAPI status check from 200 to 2xx range

Design consulted with Codex (GPT-5.2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ex-takashima
Copy link
Contributor Author

Thanks for the review, @alexhoshina @Zhaoyikaiii! All feedback has been addressed:

Commit 9a16f3b (review fixes):

  • CleanExpired: two-phase lock — collect under lock, delete files after unlock
  • CleanExpired: log file removal errors
  • CleanExpired: guard against zero MaxAge
  • Start: protect with startOnce to prevent multiple goroutines
  • cmd_gateway: call mediaStore.Stop() on error path

Commit 8e20644 (additional hardening):

  • Start: guard against Interval <= 0 to prevent time.NewTicker panic
  • ReleaseAll: also converted to two-phase lock pattern
  • Added TestStartZeroIntervalNoPanic and TestStartZeroMaxAgeNoPanic

Commit ad736d7 (LINE channel improvement per @alexhoshina's request):

  • Added StartTyping() using LINE's loading animation API (1:1 only)
  • Integrated PlaceholderRecorder.RecordTypingStop in processEvent
  • Relaxed callAPI status check to accept 2xx (loading API returns 202)

All tests pass (18/18 media, CI green). Ready for re-review.

@alexhoshina
Copy link
Collaborator

hi @ex-takashima, regarding the commit about line, it might be part of another PR. Our current PR should focus on mediastore.

@ex-takashima
Copy link
Contributor Author

@alexhoshina Got it — LINE commit has been reverted from this PR and moved to a separate PR: #811. This PR now focuses solely on MediaStore TTL cleanup.

Copy link
Collaborator

@alexhoshina alexhoshina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation. The L1/L2 hybrid design, two-phase lock/unlock pattern, and nowFunc injection are all well-thought-out. Some feedback:

1. ReleaseAll and CleanExpired can race on the same entry.
Thread A calls ReleaseAll("scope1"), grabs the lock, collects paths, deletes from maps, releases the lock. Thread B calls CleanExpired, grabs the lock, finds the entry already gone (no problem). But consider: Thread A releases the lock and starts deleting files. Thread B (or another ReleaseAll) calls os.Remove on the same path. The second os.Remove gets ENOENT, which is handled by !os.IsNotExist(err). So this is safe. Good.

2. NewFileMediaStore() (without cleanup) initializes refToScope but not stop.
If someone calls Stop() on a store created with NewFileMediaStore(), the s.stop == nil check protects against a panic. Good. But Start() also checks s.stop == nil and returns early. This means Start() is a no-op on the basic constructor, which is the right behavior.

3. The log.Printf calls use stdlib log instead of the project logger.
The rest of picoclaw uses pkg/logger (e.g. logger.InfoCF, logger.WarnCF). Using log.Printf here is inconsistent and would bypass any log-level or structured-logging configuration. Consider using logger.WarnCF("media", ...) for consistency.

4. MaxAge and Interval are stored as int (minutes) in config but converted to time.Duration at the call site.
This is fine, but consider validating in cmd_gateway.go that MaxAge > 0 and Interval > 0 before constructing the config. Currently if someone sets max_age_minutes: 0 in config, CleanExpired() returns immediately (the guard on line if s.cleanerCfg.MaxAge <= 0), but Start() will also bail out early. The user gets no feedback that their config is effectively disabled.

5. Start() guard for zero interval prevents time.NewTicker panic -- good catch.
The check s.cleanerCfg.Interval <= 0 || s.cleanerCfg.MaxAge <= 0 prevents the panic. The dedicated test for this is appreciated.

6. Minor: the PR description mentions a race condition flag -race not being tested locally.
This is fine for the PR, but the concurrent test (TestConcurrentCleanupSafety) should ideally be run with -race in CI to catch any data races in the two-phase lock pattern.

Well-structured PR with good test coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants