Implement Efficient Redis Stream Cleanup with XTRIM MINID #238
+115
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes: #237
Summary
This PR introduces a robust, multi-layered cleanup strategy for Redis streams to prevent memory bloat from expired messages and ensure long-term stability. It adds probabilistic, time-based trimming for active streams using
XTRIM MINIDand aligns the existingEXPIREmechanism with client-side TTLs.Details:
This change implements a two-part strategy where each mechanism has a distinct role:
XTRIM MINIDfor Active Streams (The Housekeeper)On every send, there is now a 10% chance of triggering an XTRIM MINID <now - 30_minutes> command.
This efficiently removes any messages older than the maximum message TTL (30 minutes) from within an active stream.
EXPIREfor Inactive Streams (The Garbage Collector)The existing EXPIRE command is retained. Its purpose is to remove streams entirely for clients who have been disconnected for more than 30 minutes.
Together,
XTRIMandEXPIREcover all scenarios, ensuring that both active and inactive streams are cleaned efficiently.Key Design Decisions & Rationale
This implementation includes several important design choices that were carefully considered:
Why is
XTRIM MINIDso Efficient?This approach is highly clever because it leverages the fact that a Redis Stream ID contains the message's creation timestamp (
<timestamp_ms>-<seq>).XTRIM MINIDuses this embedded timestamp to delete old messages natively within Redis, without the application needing to read, deserialize, and inspect message envelopes. This avoids significant CPU and network overhead.Why Probabilistic 10% Sampling?
Why XTRIM is a Separate, Best-Effort Command?
The XTRIM command is intentionally sent after the main
XADD/EXPIREpipeline. This is a critical safety measure. IfXTRIMwere in the main pipeline, a rare failure would cause the entire send operation to fail. By separating it, we ensure the non-critical cleanup task can never interfere with the critical path of message delivery.How the probabilistic approach was tested:
The probabilistic approach was validated with two targeted tests:
Statistical Correctness (Law of Large Numbers):
A test script was run to simulate the 10% probability check over a small sample (100 runs) and a large sample (100,000 runs).
Result: The large sample produced a hit rate average of 9.89%, confirming that over the high volume of messages a relay handles, the cleanup rate reliably converges on the configured 10%.
Activity-Based Scaling:
A second test simulated a mix of high-activity streams (70 total messages) and low-activity streams (30 total messages).
Result: The high-activity group, responsible for 70% of the messages, received ~72% of the XTRIM cleanup operations.
Conclusion: Tests prove that the probabilistic model naturally and automatically directs more cleanup efforts to the streams that need it most, without any complex state tracking.
What's changed: