[STILL IN DRAFT] feat: add /kv-cache route with interactive KV cache explainer#77
Open
sengopal wants to merge 10 commits intopoloclub:mainfrom
Open
[STILL IN DRAFT] feat: add /kv-cache route with interactive KV cache explainer#77sengopal wants to merge 10 commits intopoloclub:mainfrom
sengopal wants to merge 10 commits intopoloclub:mainfrom
Conversation
Design for /kv-cache route implementing interactive KV cache visualization per issue poloclub#63. Covers routing, components, data model, and state management. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Step-by-step plan for /kv-cache route: data generation script, store, KVCacheTable component, AttentionMatrix decode mode, and kv-cache SvelteKit route. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…amples Offline Python script (scripts/generate_kv_examples.py) that runs GPT-2 on 5 prompts and extracts KV cache snapshots + attention scores per decode step. Outputs 5 JS modules to src/constants/examples/kv/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New route at /kv-cache that visualizes the KV cache mechanism in GPT-2. - Prefill phase: shows existing N×N attention matrix (unchanged flow) - Decode phase: step-through controls (← Prev / Next →) reveal a growing KV cache table (K=red, V=green via VectorCanvas) per token + 1×N attention strip for the current decode token - Decode data driven by pre-computed examples (scripts/generate_kv_examples.py) with 5 prompts × 5 decode steps each; logits stripped to keep files ~1.3 MB - New store (src/store/kvcache.ts): decodeStep, kvCache, isDecoding, currentDecodeData, promptTokenCount - AttentionMatrix.svelte: when isDecoding=true shows 1×N strip + KVCacheTable instead of N×N; root page unaffected (isDecoding defaults false) - vite upgraded from 5 → 6 to match @sveltejs/vite-plugin-svelte@6 peer dep; vite.config.ts SCSS additionalData updated to use absolute path for Vite 6 - Build verified: vite build succeeds, build/kv-cache.html generated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Attention strip (Issue 1): normalize color scale to [0, max_score] so circles are visible even with low-entropy uniform distributions - QKV/MLP show all tokens (Issues 2 & 3): set $tokens to [inputToken] during decode so Embedding/QKV/MLP columns show only the new token - LinearSoftmax not updating (Issue 4): set predictedToken to the next decode step's inputToken so the prediction panel updates each step - Restore full prompt tokens when navigating back to prefill (step 0) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix contenteditable not updating from store: add afterUpdate in InputForm to sync inputRef.innerText when not focused (safe for root page) - Add topLogits (top-50 [tokenId, logit]) per decode step in Python script and regenerate examples; decode steps now ~1.35MB each (vs 1.3MB before) - Update probabilities panel per decode step using reconstructed sparse logits with user's temperature/sampling applied; greedy decode token is highlighted - Fix temperature/sampling subscribers in decode mode to re-run distribution from current step's topLogits instead of stale prefill logits - Accumulate decoded tokens in text box (prefillText + decoded so far) with ignoreInputTextChange guard to prevent re-triggering prefill - Hide MLP token labels during decode mode via .decode-mode CSS class - Override predictedToken after prefill to show greedy first decode token Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- KVCacheTable: transpose from rows to columns — tokens as columns with rotated labels, K/V row headers on left; eliminates vertical scroll - AttentionMatrix: add click-to-expand for decode mode using separate decodeExpanded state (independent of prefill's isAttentionExpanded/ expandAttention to avoid animating missing DOM elements) - Decode expanded view shows Softmax 1×N circles + full KV cache table; outside-click closes via decodeExpandableEl bound ref - Attention.svelte: elevate .head-title z-index when expanded so head nav buttons are clickable above dim overlay - Fix attentionOutputs empty in decode: use attn_implementation='eager' in generate script; default impl silently returns empty tuple with past_key_values; regenerate all 5 examples (~1.5MB each) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Derive dot-product and scaling·mask stages from softmax via log-space inversion (no data regeneration needed) - Expanded decode view shows Dot Product → Scaling·Mask → Softmax panels horizontally (nowrap, overflow visible) mirroring prefill layout - "Out" button opens inline modal with real attention×value=out computation using kvCache values for the current head - min-height on decode container matches prefill headContentHeight Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…label, add animations - Normalize value vectors globally (shared min/max) so token differences are visible; normalize Out vector with its own min/max range - Fix black strip in Out vector by clamping height to 64px (data length) - Increase value strip height 12px → 32px for more visible dimension patterns - Restyle Out trigger to match prefill column label (purple, vertical, tooltip) - Add fly/fade transitions on expand/collapse and Out modal open/close Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add 6 inference-oriented textbook pages to the /kv-cache route covering What is Inference, Prefill Phase, The KV Cache, Decode Phase, Attention During Decode, and Autoregressive Loop. Each page includes an illustrative image and proper citation from Lages (Medium), Verma & Vaidya (NVIDIA), and Not Lain (Hugging Face). Wire up Textbook component on kv-cache page, set initial page to kv-inference synchronously, and move <Textbook> outside .main-section so the floating button is not hidden by the opacity fade-in. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #63
Adds a new
/kv-cacheroute (poloclub.github.io/transformer-explainer/kv-cache) that visualizes the KV cache mechanism interactively, leaving the original root page completely untouched.Implementation
src/routes/kv-cache/+page.sveltesrc/routes/kv-cache/+page.tsexport const prerender = truesrc/store/kvcache.tsdecodeStep,kvCache,isDecoding,currentDecodeData,promptTokenCountsrc/components/KVCacheTable.sveltesrc/components/AttentionMatrix.svelteisDecoding=true: shows 1×N strip + KVCacheTable; root page unaffectedsrc/constants/examples/kv/ex{0-4}.jsscripts/generate_kv_examples.pysvelte.config.js/kv-cacheto prerender entriesvite.config.tsadditionalDataabsolute path for Vite 6 compatibilitypackage.jsonvite5→6 to match@sveltejs/vite-plugin-svelte@6peer depTest plan
/kv-cache— prefill animation plays for default example/) behavior unchanged — no KV cache UI shown🤖 Generated with Claude Code