Skip to content

xycoord/Language-Modelling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Modelling

Transformer Implementation [code]

  • KV Cache [code]
    Stateless transformer design with external cache management
  • Rotary Positional Embeddings [blog] [code]
    Both interleaved and half-flipped rotations; direct application and factory patterns for different use cases

Mech Interp [code]

  • Toy Models of Superposition
    Reproduce 5→2→5 experiments, plotting feature directions in the compressed activation space
  • Sparse Autoencoders
    ReLU, TopK and BatchTopK implementations. Trained to recover features from toy models

Engineering

  • BPE Tokeniser [blog] [code]
    6 training optimisations to go from 8 hours to 13s.
  • Test Suite [code]
    Comprehensive testing for transformer and tokeniser implementations

About

Implementations and Experiments: Transformers, RoPE, KV cache, SAEs, Tokenisers

Topics

Resources

License

Stars

Watchers

Forks

Languages