Skip to content

[STILL IN DRAFT] feat: add /kv-cache route with interactive KV cache explainer#77

Open
sengopal wants to merge 10 commits intopoloclub:mainfrom
sengopal:kv-cache-explainer
Open

[STILL IN DRAFT] feat: add /kv-cache route with interactive KV cache explainer#77
sengopal wants to merge 10 commits intopoloclub:mainfrom
sengopal:kv-cache-explainer

Conversation

@sengopal
Copy link
Copy Markdown

Summary

Closes #63

Adds a new /kv-cache route (poloclub.github.io/transformer-explainer/kv-cache) that visualizes the KV cache mechanism interactively, leaving the original root page completely untouched.

  • Prefill phase: shows the existing N×N attention matrix for the selected prompt (same visualization as root page)
  • Decode phase: step-through controls (← Prev / Next →) reveal:
    • A growing KV cache table showing each token's cached Key vector (red) and Value vector (green)
    • A 1×N attention strip showing the current decode token attending to all cached context
  • Head selector and example selector from the Topbar remain active throughout

Implementation

File Change
src/routes/kv-cache/+page.svelte New route — same layout as root, adds decode controls
src/routes/kv-cache/+page.ts export const prerender = true
src/store/kvcache.ts New store: decodeStep, kvCache, isDecoding, currentDecodeData, promptTokenCount
src/components/KVCacheTable.svelte Growing table of cached token K/V vectors using VectorCanvas
src/components/AttentionMatrix.svelte When isDecoding=true: shows 1×N strip + KVCacheTable; root page unaffected
src/constants/examples/kv/ex{0-4}.js Pre-computed KV cache + attention data for 5 prompts × 5 decode steps
scripts/generate_kv_examples.py Offline script using HuggingFace GPT-2 to regenerate example data
svelte.config.js Added /kv-cache to prerender entries
vite.config.ts Fixed SCSS additionalData absolute path for Vite 6 compatibility
package.json Upgraded vite 5→6 to match @sveltejs/vite-plugin-svelte@6 peer dep

Test plan

  • Navigate to /kv-cache — prefill animation plays for default example
  • Click Next → — KV cache table grows by one row, attention strip updates
  • Click ← Prev — reverts to previous step; at step 0, returns to prefill view
  • Change example in Topbar — new prefill runs, decode controls reset
  • Change attention head — KV vectors update to selected head
  • Root page (/) behavior unchanged — no KV cache UI shown

🤖 Generated with Claude Code

sengopal and others added 4 commits March 27, 2026 11:41
Design for /kv-cache route implementing interactive KV cache visualization
per issue poloclub#63. Covers routing, components, data model, and state management.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Step-by-step plan for /kv-cache route: data generation script,
store, KVCacheTable component, AttentionMatrix decode mode, and
kv-cache SvelteKit route.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…amples

Offline Python script (scripts/generate_kv_examples.py) that runs GPT-2
on 5 prompts and extracts KV cache snapshots + attention scores per decode
step. Outputs 5 JS modules to src/constants/examples/kv/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New route at /kv-cache that visualizes the KV cache mechanism in GPT-2.

- Prefill phase: shows existing N×N attention matrix (unchanged flow)
- Decode phase: step-through controls (← Prev / Next →) reveal a growing
  KV cache table (K=red, V=green via VectorCanvas) per token + 1×N
  attention strip for the current decode token
- Decode data driven by pre-computed examples (scripts/generate_kv_examples.py)
  with 5 prompts × 5 decode steps each; logits stripped to keep files ~1.3 MB
- New store (src/store/kvcache.ts): decodeStep, kvCache, isDecoding,
  currentDecodeData, promptTokenCount
- AttentionMatrix.svelte: when isDecoding=true shows 1×N strip + KVCacheTable
  instead of N×N; root page unaffected (isDecoding defaults false)
- vite upgraded from 5 → 6 to match @sveltejs/vite-plugin-svelte@6 peer dep;
  vite.config.ts SCSS additionalData updated to use absolute path for Vite 6
- Build verified: vite build succeeds, build/kv-cache.html generated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sengopal sengopal changed the title feat: add /kv-cache route with interactive KV cache explainer [STILL IN DRAFT] feat: add /kv-cache route with interactive KV cache explainer Mar 27, 2026
sengopal and others added 6 commits March 27, 2026 13:35
- Attention strip (Issue 1): normalize color scale to [0, max_score] so
  circles are visible even with low-entropy uniform distributions
- QKV/MLP show all tokens (Issues 2 & 3): set $tokens to [inputToken]
  during decode so Embedding/QKV/MLP columns show only the new token
- LinearSoftmax not updating (Issue 4): set predictedToken to the next
  decode step's inputToken so the prediction panel updates each step
- Restore full prompt tokens when navigating back to prefill (step 0)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix contenteditable not updating from store: add afterUpdate in InputForm
  to sync inputRef.innerText when not focused (safe for root page)
- Add topLogits (top-50 [tokenId, logit]) per decode step in Python script
  and regenerate examples; decode steps now ~1.35MB each (vs 1.3MB before)
- Update probabilities panel per decode step using reconstructed sparse logits
  with user's temperature/sampling applied; greedy decode token is highlighted
- Fix temperature/sampling subscribers in decode mode to re-run distribution
  from current step's topLogits instead of stale prefill logits
- Accumulate decoded tokens in text box (prefillText + decoded so far) with
  ignoreInputTextChange guard to prevent re-triggering prefill
- Hide MLP token labels during decode mode via .decode-mode CSS class
- Override predictedToken after prefill to show greedy first decode token

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- KVCacheTable: transpose from rows to columns — tokens as columns with
  rotated labels, K/V row headers on left; eliminates vertical scroll
- AttentionMatrix: add click-to-expand for decode mode using separate
  decodeExpanded state (independent of prefill's isAttentionExpanded/
  expandAttention to avoid animating missing DOM elements)
- Decode expanded view shows Softmax 1×N circles + full KV cache table;
  outside-click closes via decodeExpandableEl bound ref
- Attention.svelte: elevate .head-title z-index when expanded so head
  nav buttons are clickable above dim overlay
- Fix attentionOutputs empty in decode: use attn_implementation='eager'
  in generate script; default impl silently returns empty tuple with
  past_key_values; regenerate all 5 examples (~1.5MB each)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Derive dot-product and scaling·mask stages from softmax via log-space
  inversion (no data regeneration needed)
- Expanded decode view shows Dot Product → Scaling·Mask → Softmax panels
  horizontally (nowrap, overflow visible) mirroring prefill layout
- "Out" button opens inline modal with real attention×value=out computation
  using kvCache values for the current head
- min-height on decode container matches prefill headContentHeight

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…label, add animations

- Normalize value vectors globally (shared min/max) so token differences
  are visible; normalize Out vector with its own min/max range
- Fix black strip in Out vector by clamping height to 64px (data length)
- Increase value strip height 12px → 32px for more visible dimension patterns
- Restyle Out trigger to match prefill column label (purple, vertical, tooltip)
- Add fly/fade transitions on expand/collapse and Out modal open/close

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add 6 inference-oriented textbook pages to the /kv-cache route covering
What is Inference, Prefill Phase, The KV Cache, Decode Phase, Attention
During Decode, and Autoregressive Loop. Each page includes an illustrative
image and proper citation from Lages (Medium), Verma & Vaidya (NVIDIA),
and Not Lain (Hugging Face).

Wire up Textbook component on kv-cache page, set initial page to
kv-inference synchronously, and move <Textbook> outside .main-section
so the floating button is not hidden by the opacity fade-in.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kv cache append explainer

1 participant