Skip to content

Conversation

@asungur
Copy link
Owner

@asungur asungur commented Dec 30, 2025

  • Create retention.go with RetentionPolicy
  • Implement SetRetention() configuration
  • Implement background cleanup goroutine
  • Handle index cleanup when events expire
  • Test retention policy enforcement

@claude
Copy link

claude bot commented Dec 30, 2025

Pull Request Review: feat: Add retention

Summary

This PR adds a retention policy feature to the Squid event database, allowing automatic cleanup of old events. The implementation includes a configurable retention policy, background cleanup goroutine, and proper index cleanup. Overall, the implementation is solid with excellent test coverage.

Code Quality: ✅ Good

Strengths

  • Well-structured code: Clear separation of concerns with dedicated retention.go
  • Excellent test coverage: Comprehensive tests (355 lines) covering multiple scenarios
  • Good documentation: Clear comments explaining behavior and defaults
  • Proper cleanup: Correctly removes indices when events are deleted
  • Thread-safe: Proper use of mutexes for concurrent access

Issues & Recommendations

🔴 Critical Issues

1. Race Condition in Close() - squid.go:48

if db.retention != nil && db.retention.running {
    db.retention.cancel()
    <-db.retention.done
}

Problem: The running field is read without holding db.retention.mu. This creates a race condition between Close() and the goroutine that sets running = false.

Fix: Either:

  • Read running while holding the retention mutex, OR
  • Simply check db.retention != nil and always call cancel (which is safe to call multiple times)
// Recommended fix:
if db.retention != nil {
    db.retention.cancel()
    <-db.retention.done
}

2. Error Handling Silently Ignores Deletion Failures - retention.go:120-123

for _, entry := range toDelete {
    if err := db.deleteEventAndIndices(txn, entry); err != nil {
        continue  // ⚠️ Silently ignores errors!
    }
    deleted++
}

Problem: Individual deletion errors are silently ignored, making debugging difficult and potentially leaving orphaned indices.

Fix: Log errors or collect them:

var errs []error
for _, entry := range toDelete {
    if err := db.deleteEventAndIndices(txn, entry); err != nil {
        errs = append(errs, fmt.Errorf("failed to delete event %s: %w", entry.id, err))
        continue
    }
    deleted++
}
// Consider logging errs if len(errs) > 0

🟡 Important Issues

3. Potential Memory Issue with Large Deletions - retention.go:139-175

func (db *DB) findExpiredEvents(txn *badger.Txn, before time.Time) ([]deleteEntry, error) {
    var toDelete []deleteEntry
    // ... accumulates ALL expired events in memory

Problem: For databases with millions of expired events, this loads everything into memory at once.

Impact: Could cause OOM on large databases with long retention periods.

Fix: Consider batch processing with a configurable limit:

const maxBatchSize = 10000 // configurable

func (db *DB) findExpiredEvents(txn *badger.Txn, before time.Time) ([]deleteEntry, error) {
    var toDelete []deleteEntry
    // ...
    for it.Seek(prefix); it.ValidForPrefix(prefix) && len(toDelete) < maxBatchSize; it.Next() {
        // ...
    }
    return toDelete, nil
}

4. Missing Immediate Cleanup Synchronization - retention.go:82-84

// Run cleanup immediately on start
cutoff := time.Now().Add(-state.policy.MaxAge)
db.deleteBefore(cutoff)

Problem: The immediate cleanup runs asynchronously, so SetRetention() returns before cleanup completes. This could be surprising for users expecting synchronous behavior.

Consideration: Document this behavior clearly, or provide a synchronous option for initial cleanup.

🟢 Minor Issues

5. Inconsistent Error Handling in DeleteBefore() - retention.go:99-107

func (db *DB) DeleteBefore(before time.Time) (int64, error) {
    db.mu.RLock()
    if db.closed {
        db.mu.RUnlock()
        return 0, ErrClosed
    }
    db.mu.RUnlock()

    return db.deleteBefore(before)
}

Suggestion: The internal deleteBefore() doesn't check for closed state, relying on Badger to return an error. For consistency with other methods, consider checking in the internal method too.

6. JSON Unmarshal Error Handling - retention.go:159-163

err := item.Value(func(val []byte) error {
    return json.Unmarshal(val, &event)
})
if err != nil {
    continue  // Silently skips corrupted events
}

Suggestion: Consider logging corrupted events for debugging purposes.

Performance Considerations

✅ Good Choices

  • Early break optimization (retention.go:171): Stops iteration when events are no longer expired - excellent!
  • Single transaction: Batches deletions efficiently
  • Configurable cleanup interval: Allows tuning for different use cases

⚠️ Potential Concerns

  1. Large-scale deletions: As mentioned, could cause memory issues with millions of events
  2. Transaction size: Very large deletions might exceed Badger transaction limits
  3. Lock contention: The cleanup holds a write transaction, potentially blocking other operations

Recommendation: Consider adding metrics/logging to monitor cleanup performance:

  • Number of events deleted per run
  • Time taken for cleanup
  • Memory used during cleanup

Security Concerns

✅ No Critical Security Issues Found

  • Proper synchronization prevents race conditions (except the one noted)
  • No SQL injection risk (using KV store)
  • No user input validation issues
  • Context cancellation properly implemented

📝 Best Practice

Consider adding rate limiting or backpressure if cleanup becomes too aggressive, preventing DoS of the database during large deletions.

Test Coverage: ✅ Excellent

Covered Scenarios

  • ✅ Basic deletion with timestamp cutoffs
  • ✅ Index cleanup verification
  • ✅ Automatic background cleanup
  • ✅ Enable/disable retention
  • ✅ Policy updates (restart)
  • ✅ Graceful shutdown
  • ✅ Default and minimum intervals
  • ✅ Edge cases (zero MaxAge, short intervals)

Suggested Additional Tests

  1. Concurrent operations: Test retention cleanup during heavy write load
  2. Large dataset: Test with ~100k events to verify memory behavior
  3. Transaction failures: Test behavior when Badger transaction fails mid-cleanup
  4. Edge case: What happens if SetRetention is called multiple times rapidly?

Additional Recommendations

Documentation

  1. Add godoc examples showing typical usage:
// Example:
db.SetRetention(RetentionPolicy{
    MaxAge:          7 * 24 * time.Hour,  // Keep 7 days
    CleanupInterval: time.Hour,            // Clean every hour
})
  1. Document the async behavior of SetRetention()

API Design

Consider returning cleanup metrics from SetRetention or providing a way to query them:

type RetentionStats struct {
    LastCleanup    time.Time
    EventsDeleted  int64
    LastError      error
}

func (db *DB) RetentionStats() RetentionStats

Summary

This is a well-implemented feature with strong test coverage and good architectural decisions. The main concerns are:

  1. Must fix: Race condition in Close()
  2. Should fix: Silent error handling in deletion loop
  3. Consider: Memory usage for large-scale deletions

Once the race condition is addressed, this PR is ready to merge. The retention feature will be a valuable addition to Squid!

Overall Rating: 8/10 - Excellent work! 🎉


Review generated with assistance from Claude Code

Repository owner deleted a comment from claude bot Dec 30, 2025
@asungur asungur merged commit 54d314c into main Dec 30, 2025
2 checks passed
@asungur asungur deleted the add-db-retention branch December 30, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants