-
Notifications
You must be signed in to change notification settings - Fork 10
Streaming response delivery with deferred cache write (TeeBody) #248
Description
Problem
When a cacheable response is received from upstream on a cache miss, the current flow blocks the client until the entire response body is buffered, stored in cache, and then reconstructed:
upstream responds (Passthrough body)
→ response predicates (headers/status checked, body untouched)
→ into_cached(): body.collect().await ← BLOCKS until full body received
→ backend.set() ← serializes + compresses + writes
→ from_cached() ← reconstructs response from cached bytes
→ client receives response ← LAST in line
For large responses (file downloads, big API payloads), the client waits for the entire body to be buffered and cached before receiving the first byte. This defeats HTTP chunked transfer and adds unnecessary latency.
Proposed Solution
Introduce a TeeBody mechanism that streams the response to the client immediately while writing fixed-size indexed chunks to cache in the background.
New flow
upstream responds (Passthrough body)
→ response predicates (headers/status checked, body untouched)
→ wrap body in TeeBody
→ return response to client immediately ← CLIENT FIRST
→ as client consumes frames, TeeBody copies each into fixed-size chunks
→ each completed chunk is written to cache with its index
→ on body completion, finalize the chunk sequence
Chunked indexed storage
Instead of buffering the full body and writing it as a single blob, the body is split into fixed-size indexed chunks and stored separately from response metadata:
- Metadata (headers, status) is serialized and stored as a single cache entry
- Body chunks are stored as raw bytes with sequential indices (e.g.
key:meta,key:0,key:1, …) - On cache hit, metadata is deserialized and body chunks are read back in order and streamed to the client
This approach has several advantages:
- Streaming without backend streaming support — even if the backend (e.g. Moka, Redis) only supports simple get/set, indexed chunks simulate streaming writes and reads. No changes to the
Backendtrait are required. - HTTP Range request support — since the body is stored as indexed fixed-size chunks, serving byte-range requests from cache becomes straightforward. The chunk index maps directly to byte offsets.
- Pre-allocated chunk pool — a pool of reusable fixed-size buffers can be used to reduce allocation overhead during streaming. Chunks are borrowed from the pool, filled, written to cache, and returned to the pool.
Custom serialization: meta and body separated
Currently into_cached() serializes the entire response (headers + body) together. The new approach splits this:
- Meta serialization: only headers and status are serialized (e.g. via the existing
CachedValuemechanism) - Body storage: raw bytes, split into fixed-size chunks — no serialization needed. Chunks can optionally be compressed (gzip), though the compression strategy needs further consideration.
- Deserialization: read meta entry, then stream body chunks back in index order
Touch points
1. New TeeBody type (hitbox-http/src/body.rs)
- Wraps a
BufferedBody<B>, forwardspoll_frame()to the client - Copies each data frame into fixed-size chunks internally
- On each chunk completion, signals it for background cache write
- On body completion, signals the final chunk
2. Modified into_cached() path (hitbox-http/src/response.rs:410-454)
- Currently calls
body.collect().awaitwhich blocks until the full body is received - Instead: return early with a
TeeBodythat streams to client, plus a handle for background chunk writes
3. Modified FSM transition (hitbox/src/fsm/states.rs:456-504)
- Currently:
backend.set()→from_cached()→ return response - New: store metadata immediately, return response with
TeeBody, spawn background task (via existing offload infrastructure) that writes chunks as they complete
4. Modified from_cached() path
- Currently: deserializes entire response from a single cached blob
- New: deserializes metadata, then streams body chunks from cache in index order
Existing infrastructure to reuse
The offload mechanism used by stale-while-revalidate (OffloadRevalidate in hitbox/src/fsm/future.rs:286-321) already implements a "serve now, update cache later" pattern. The chunk writing can use the same offload.register() infrastructure.
Edge Cases
| Scenario | Behavior |
|---|---|
| Body error mid-stream | Client gets a truncated response (unavoidable). Partial chunks are discarded — cache entry is not finalized. |
| Backend write failure | Client already received the full response. Silent cache miss — log the error, next request re-fetches. |
| Concurrent duplicate misses | Both requests tee independently. Second write overwrites first (idempotent, acceptable). |
| Memory | With a pre-allocated chunk pool, memory usage is bounded by pool size. Without a pool, peak memory is comparable to today but spread across smaller allocations. |
Open Questions
- Chunk compression: should individual chunks be compressed, or should compression be left to the backend? Per-chunk gzip adds CPU overhead but reduces storage. Needs benchmarking.
- Chunk size: what's the optimal fixed chunk size? Likely configurable, with a sensible default (e.g. 64 KiB or 128 KiB).
- Chunk pool sizing: how many pre-allocated chunks to keep in the pool? Should be tunable based on expected concurrency.
Non-goals (follow-up work)
- Unbounded stream detection (SSE, WebSocket) — these should be handled by response predicates marking them as non-cacheable, so the body stays
Passthroughand streams through without tee. Separate concern.
Affected crates
hitbox-http(TeeBody type,into_cached()/from_cached()modifications, chunk pool)hitbox(FSM state transition, offload integration)hitbox-backend(potentially, if theBackendtrait needs helper methods for indexed chunk operations)