Skip to content

Streaming response delivery with deferred cache write (TeeBody) #248

@AndreyErmilov

Description

@AndreyErmilov

Problem

When a cacheable response is received from upstream on a cache miss, the current flow blocks the client until the entire response body is buffered, stored in cache, and then reconstructed:

upstream responds (Passthrough body)
  → response predicates (headers/status checked, body untouched)
  → into_cached(): body.collect().await  ← BLOCKS until full body received
  → backend.set()                        ← serializes + compresses + writes
  → from_cached()                        ← reconstructs response from cached bytes
  → client receives response             ← LAST in line

For large responses (file downloads, big API payloads), the client waits for the entire body to be buffered and cached before receiving the first byte. This defeats HTTP chunked transfer and adds unnecessary latency.

Proposed Solution

Introduce a TeeBody mechanism that streams the response to the client immediately while writing fixed-size indexed chunks to cache in the background.

New flow

upstream responds (Passthrough body)
  → response predicates (headers/status checked, body untouched)
  → wrap body in TeeBody
  → return response to client immediately       ← CLIENT FIRST
  → as client consumes frames, TeeBody copies each into fixed-size chunks
  → each completed chunk is written to cache with its index
  → on body completion, finalize the chunk sequence

Chunked indexed storage

Instead of buffering the full body and writing it as a single blob, the body is split into fixed-size indexed chunks and stored separately from response metadata:

  • Metadata (headers, status) is serialized and stored as a single cache entry
  • Body chunks are stored as raw bytes with sequential indices (e.g. key:meta, key:0, key:1, …)
  • On cache hit, metadata is deserialized and body chunks are read back in order and streamed to the client

This approach has several advantages:

  1. Streaming without backend streaming support — even if the backend (e.g. Moka, Redis) only supports simple get/set, indexed chunks simulate streaming writes and reads. No changes to the Backend trait are required.
  2. HTTP Range request support — since the body is stored as indexed fixed-size chunks, serving byte-range requests from cache becomes straightforward. The chunk index maps directly to byte offsets.
  3. Pre-allocated chunk pool — a pool of reusable fixed-size buffers can be used to reduce allocation overhead during streaming. Chunks are borrowed from the pool, filled, written to cache, and returned to the pool.

Custom serialization: meta and body separated

Currently into_cached() serializes the entire response (headers + body) together. The new approach splits this:

  • Meta serialization: only headers and status are serialized (e.g. via the existing CachedValue mechanism)
  • Body storage: raw bytes, split into fixed-size chunks — no serialization needed. Chunks can optionally be compressed (gzip), though the compression strategy needs further consideration.
  • Deserialization: read meta entry, then stream body chunks back in index order

Touch points

1. New TeeBody type (hitbox-http/src/body.rs)

  • Wraps a BufferedBody<B>, forwards poll_frame() to the client
  • Copies each data frame into fixed-size chunks internally
  • On each chunk completion, signals it for background cache write
  • On body completion, signals the final chunk

2. Modified into_cached() path (hitbox-http/src/response.rs:410-454)

  • Currently calls body.collect().await which blocks until the full body is received
  • Instead: return early with a TeeBody that streams to client, plus a handle for background chunk writes

3. Modified FSM transition (hitbox/src/fsm/states.rs:456-504)

  • Currently: backend.set()from_cached() → return response
  • New: store metadata immediately, return response with TeeBody, spawn background task (via existing offload infrastructure) that writes chunks as they complete

4. Modified from_cached() path

  • Currently: deserializes entire response from a single cached blob
  • New: deserializes metadata, then streams body chunks from cache in index order

Existing infrastructure to reuse

The offload mechanism used by stale-while-revalidate (OffloadRevalidate in hitbox/src/fsm/future.rs:286-321) already implements a "serve now, update cache later" pattern. The chunk writing can use the same offload.register() infrastructure.

Edge Cases

Scenario Behavior
Body error mid-stream Client gets a truncated response (unavoidable). Partial chunks are discarded — cache entry is not finalized.
Backend write failure Client already received the full response. Silent cache miss — log the error, next request re-fetches.
Concurrent duplicate misses Both requests tee independently. Second write overwrites first (idempotent, acceptable).
Memory With a pre-allocated chunk pool, memory usage is bounded by pool size. Without a pool, peak memory is comparable to today but spread across smaller allocations.

Open Questions

  • Chunk compression: should individual chunks be compressed, or should compression be left to the backend? Per-chunk gzip adds CPU overhead but reduces storage. Needs benchmarking.
  • Chunk size: what's the optimal fixed chunk size? Likely configurable, with a sensible default (e.g. 64 KiB or 128 KiB).
  • Chunk pool sizing: how many pre-allocated chunks to keep in the pool? Should be tunable based on expected concurrency.

Non-goals (follow-up work)

  • Unbounded stream detection (SSE, WebSocket) — these should be handled by response predicates marking them as non-cacheable, so the body stays Passthrough and streams through without tee. Separate concern.

Affected crates

  • hitbox-http (TeeBody type, into_cached() / from_cached() modifications, chunk pool)
  • hitbox (FSM state transition, offload integration)
  • hitbox-backend (potentially, if the Backend trait needs helper methods for indexed chunk operations)

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance optimization and benchmarkingresearch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions