feat: Add retention #5

asungur · 2025-12-30T18:13:55Z

Create retention.go with RetentionPolicy
Implement SetRetention() configuration
Implement background cleanup goroutine
Handle index cleanup when events expire
Test retention policy enforcement

claude · 2025-12-30T18:15:28Z

Pull Request Review: feat: Add retention

Summary

This PR adds a retention policy feature to the Squid event database, allowing automatic cleanup of old events. The implementation includes a configurable retention policy, background cleanup goroutine, and proper index cleanup. Overall, the implementation is solid with excellent test coverage.

Code Quality: ✅ Good

Strengths

Well-structured code: Clear separation of concerns with dedicated retention.go
Excellent test coverage: Comprehensive tests (355 lines) covering multiple scenarios
Good documentation: Clear comments explaining behavior and defaults
Proper cleanup: Correctly removes indices when events are deleted
Thread-safe: Proper use of mutexes for concurrent access

Issues & Recommendations

🔴 Critical Issues

1. Race Condition in Close() - squid.go:48

if db.retention != nil && db.retention.running {
    db.retention.cancel()
    <-db.retention.done
}

Problem: The running field is read without holding db.retention.mu. This creates a race condition between Close() and the goroutine that sets running = false.

Fix: Either:

Read running while holding the retention mutex, OR
Simply check db.retention != nil and always call cancel (which is safe to call multiple times)

// Recommended fix:
if db.retention != nil {
    db.retention.cancel()
    <-db.retention.done
}

2. Error Handling Silently Ignores Deletion Failures - retention.go:120-123

for _, entry := range toDelete {
    if err := db.deleteEventAndIndices(txn, entry); err != nil {
        continue  // ⚠️ Silently ignores errors!
    }
    deleted++
}

Problem: Individual deletion errors are silently ignored, making debugging difficult and potentially leaving orphaned indices.

Fix: Log errors or collect them:

var errs []error
for _, entry := range toDelete {
    if err := db.deleteEventAndIndices(txn, entry); err != nil {
        errs = append(errs, fmt.Errorf("failed to delete event %s: %w", entry.id, err))
        continue
    }
    deleted++
}
// Consider logging errs if len(errs) > 0

🟡 Important Issues

3. Potential Memory Issue with Large Deletions - retention.go:139-175

func (db *DB) findExpiredEvents(txn *badger.Txn, before time.Time) ([]deleteEntry, error) {
    var toDelete []deleteEntry
    // ... accumulates ALL expired events in memory

Problem: For databases with millions of expired events, this loads everything into memory at once.

Impact: Could cause OOM on large databases with long retention periods.

Fix: Consider batch processing with a configurable limit:

const maxBatchSize = 10000 // configurable

func (db *DB) findExpiredEvents(txn *badger.Txn, before time.Time) ([]deleteEntry, error) {
    var toDelete []deleteEntry
    // ...
    for it.Seek(prefix); it.ValidForPrefix(prefix) && len(toDelete) < maxBatchSize; it.Next() {
        // ...
    }
    return toDelete, nil
}

4. Missing Immediate Cleanup Synchronization - retention.go:82-84

// Run cleanup immediately on start
cutoff := time.Now().Add(-state.policy.MaxAge)
db.deleteBefore(cutoff)

Problem: The immediate cleanup runs asynchronously, so SetRetention() returns before cleanup completes. This could be surprising for users expecting synchronous behavior.

Consideration: Document this behavior clearly, or provide a synchronous option for initial cleanup.

🟢 Minor Issues

5. Inconsistent Error Handling in DeleteBefore() - retention.go:99-107

func (db *DB) DeleteBefore(before time.Time) (int64, error) {
    db.mu.RLock()
    if db.closed {
        db.mu.RUnlock()
        return 0, ErrClosed
    }
    db.mu.RUnlock()

    return db.deleteBefore(before)
}

Suggestion: The internal deleteBefore() doesn't check for closed state, relying on Badger to return an error. For consistency with other methods, consider checking in the internal method too.

6. JSON Unmarshal Error Handling - retention.go:159-163

err := item.Value(func(val []byte) error {
    return json.Unmarshal(val, &event)
})
if err != nil {
    continue  // Silently skips corrupted events
}

Suggestion: Consider logging corrupted events for debugging purposes.

Performance Considerations

✅ Good Choices

Early break optimization (retention.go:171): Stops iteration when events are no longer expired - excellent!
Single transaction: Batches deletions efficiently
Configurable cleanup interval: Allows tuning for different use cases

⚠️ Potential Concerns

Large-scale deletions: As mentioned, could cause memory issues with millions of events
Transaction size: Very large deletions might exceed Badger transaction limits
Lock contention: The cleanup holds a write transaction, potentially blocking other operations

Recommendation: Consider adding metrics/logging to monitor cleanup performance:

Number of events deleted per run
Time taken for cleanup
Memory used during cleanup

Security Concerns

✅ No Critical Security Issues Found

Proper synchronization prevents race conditions (except the one noted)
No SQL injection risk (using KV store)
No user input validation issues
Context cancellation properly implemented

📝 Best Practice

Consider adding rate limiting or backpressure if cleanup becomes too aggressive, preventing DoS of the database during large deletions.

Test Coverage: ✅ Excellent

Covered Scenarios

✅ Basic deletion with timestamp cutoffs
✅ Index cleanup verification
✅ Automatic background cleanup
✅ Enable/disable retention
✅ Policy updates (restart)
✅ Graceful shutdown
✅ Default and minimum intervals
✅ Edge cases (zero MaxAge, short intervals)

Suggested Additional Tests

Concurrent operations: Test retention cleanup during heavy write load
Large dataset: Test with ~100k events to verify memory behavior
Transaction failures: Test behavior when Badger transaction fails mid-cleanup
Edge case: What happens if SetRetention is called multiple times rapidly?

Additional Recommendations

Documentation

Add godoc examples showing typical usage:

// Example:
db.SetRetention(RetentionPolicy{
    MaxAge:          7 * 24 * time.Hour,  // Keep 7 days
    CleanupInterval: time.Hour,            // Clean every hour
})

Document the async behavior of SetRetention()

API Design

Consider returning cleanup metrics from SetRetention or providing a way to query them:

type RetentionStats struct {
    LastCleanup    time.Time
    EventsDeleted  int64
    LastError      error
}

func (db *DB) RetentionStats() RetentionStats

Summary

This is a well-implemented feature with strong test coverage and good architectural decisions. The main concerns are:

Must fix: Race condition in Close()
Should fix: Silent error handling in deletion loop
Consider: Memory usage for large-scale deletions

Once the race condition is addressed, this PR is ready to merge. The retention feature will be a valuable addition to Squid!

Overall Rating: 8/10 - Excellent work! 🎉

Review generated with assistance from Claude Code

asungur added 3 commits December 30, 2025 21:12

Add retention logic

43bf659

use retention

d891afb

add tests

4873012

asungur added 2 commits December 30, 2025 21:19

fix race condition

b825d8b

do best effort index cleanups

6be0a2d

Repository owner deleted a comment from claude bot Dec 30, 2025

asungur merged commit 54d314c into main Dec 30, 2025
2 checks passed

asungur deleted the add-db-retention branch December 30, 2025 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add retention #5

feat: Add retention #5

Uh oh!

asungur commented Dec 30, 2025

Uh oh!

claude bot commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add retention #5

feat: Add retention #5

Uh oh!

Conversation

asungur commented Dec 30, 2025

Uh oh!

claude bot commented Dec 30, 2025

Pull Request Review: feat: Add retention

Summary

Code Quality: ✅ Good

Strengths

Issues & Recommendations

🔴 Critical Issues

🟡 Important Issues

🟢 Minor Issues

Performance Considerations

✅ Good Choices

⚠️ Potential Concerns

Security Concerns

✅ No Critical Security Issues Found

📝 Best Practice

Test Coverage: ✅ Excellent

Covered Scenarios

Suggested Additional Tests

Additional Recommendations

Documentation

API Design

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants