SmallLM

A small, GPT-like Language Model with KV Cache and Batched Attention

This project provides a minimal, modular, and extensible framework for training and generating text with a transformer-based GPT-like language model. It features a small language model optimized with KV Caching for fast inference.

Model Architecture	BPE Tokenization

Key Features

Batched Attention: Optimized multi-head attention that projects and computes all heads in parallel, doubling raw throughput.
KV Cache: Accelerates autoregressive generation by caching and reusing Keys and Values from previous tokens, providing up to 4.5x speedup on larger model configurations.
Detailed Documentation: Comprehensive guides on architecture and KV cache.

Installation

Clone the repository:

git clone https://github.com/yourusername/smallLM.git
cd smallLM

Install dependencies:
```
pip install -r requirements.txt
```

Usage

1. Training

python main.py train

The best model checkpoint will be saved to checkpoints/best_model.pt.

2. Fast Generation (with KV Cache)

python main.py generate --query "Once upon a time" --max_new_tokens 100 --use_kv_cache

--use_kv_cache: Enables the Key-Value cache for faster inference.
--max_new_tokens: Number of tokens to generate.
--temperature: Sampling temperature.

3. Benchmarking

Compare performance with and without KV cache:

# Benchmark default model
python benchmark_kv_cache.py --max_new_tokens 200

# Benchmark a larger model to see scaling benefits
python benchmark_kv_cache.py --max_new_tokens 500 --n_embd 768 --n_layer 12 --n_head 12

┌────────────────────────────────────────────────────────┐
│                      RESULTS                           │
├──────────────────┬──────────────────┬──────────────────┤
│                  │  Without Cache   │   With Cache     │
├──────────────────┼──────────────────┼──────────────────┤
│ Tokens generated │              500 │              500 │
│ Time (seconds)   │          28.2134 │           6.1797 │
│ Tokens/sec       │            17.72 │            80.91 │
├──────────────────┴──────────────────┴──────────────────┤
│ Speedup: 4.57x                                         │
| GPU: Nvidia GTX 1650                                   |
└────────────────────────────────────────────────────────┘

License

MIT License

Acknowledgements

Inspired by GPT and nanoGPT projects.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
smalllm		smalllm
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
benchmark_kv_cache.py		benchmark_kv_cache.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmallLM

Key Features

Installation

Usage

1. Training

2. Fast Generation (with KV Cache)

3. Benchmarking

License

Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmallLM

Key Features

Installation

Usage

1. Training

2. Fast Generation (with KV Cache)

3. Benchmarking

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages