learn-llm-from-scratch

Build and train a 124M parameter language model in one day. Complete tutorial from architecture to text generation.

Build a 124M LLM from Scratch in One Day 🚀

Learn to build and train a language model (same size as GPT-2 Small) in a single day on a free GPU.

🎯 What You'll Build

124M parameter transformer model (GPT-2 Small size)
Complete architecture: RoPE, Flash Attention, 12 transformer layers
Full training pipeline on WikiText-2 dataset
Text generation system
Total cost: $0 (runs on Google Colab free tier)

✨ What Makes This Different

You don't just copy-paste code - you BUILD it.

Fill-in-the-blanks approach using Claude/ChatGPT
You write the code, you own the IP
Understand every line, not just run it
Portfolio-ready project you can explain in interviews

📚 What You'll Learn

How transformers actually work internally
Modern techniques: RoPE (used in LLaMA), Flash Attention
Complete training loop: loss, gradients, optimization
Why ChatGPT works the way it does
Path to scale from 100M → 7B → 175B parameters

🎓 Who Is This For?

✅ Engineers who want to understand LLMs deeply (not just use APIs)
✅ Students learning AI/ML fundamentals
✅ Developers building AI products who need internal knowledge
✅ Anyone who wants to differentiate themselves in the AI field

❌ Not for: Casual users who just want to use ChatGPT

⚡ Quick Start

Open the tutorial in Google Colab
Upload llm_tutorial_complete.py
Runtime → Change runtime type → GPU (T4)
Follow the tutorial step-by-step
Have Claude or ChatGPT ready to help fill in code

Time required: 5-7 hours (building + training)

📊 Expected Results

After training (~1.5-2 hours), your model will:

✅ Generate grammatically correct English
✅ Show it learned language patterns
✅ Demonstrate proper sentence structure

⚠️ Educational Model: Won't be ChatGPT-quality (needs 100x more data, 1000x more training). But you'll understand how to get there.

🛠️ What's Included

Complete architecture tutorial (RoPE, Attention, Transformer blocks)
Training pipeline (data loading, optimization, checkpointing)
Text generation (sampling, temperature, top-p)
Detailed explanations at every step
No hidden "magic" - everything explained

💡 What You Get

After completing this tutorial:

Working 124M parameter model
Deep understanding of transformer internals
Code you wrote and can explain
Portfolio project with real substance
Foundation to build larger models (1B, 7B+)

📖 Tutorial Structure

Part 1: Architecture (2-3 hours)
├── Model configuration
├── Rotary positional embeddings (RoPE)
├── Flash attention mechanism
├── Transformer blocks
└── Complete model assembly

Part 2: Training (3-4 hours)
├── WikiText-2 dataset loading
├── Tokenization (GPT-2 tokenizer)
├── Training loop implementation
├── Loss optimization
└── Checkpoint management

Part 3: Generation
├── Text generation implementation
├── Sampling strategies
└── Testing your trained model

🚀 Getting Started

Download llm_tutorial_complete.py
Open in Google Colab
Enable T4 GPU (free tier)
Follow step-by-step instructions
Use Claude/ChatGPT to help implement each section

📧 Questions?

Feel free to reach out: rahuldass1901@gmail.com

🤝 Contributing

Found an improvement? Issues and pull requests welcome!

⭐ Show Your Support

If this helped you learn, give it a star! ⭐

Share your success - let others know they can learn this too.

📜 License

MIT License - use freely, learn freely, build freely.

Remember: This is about LEARNING, not building production ChatGPT. You're learning the fundamentals that took OpenAI millions of dollars to figure out. That's the value. 💪

Built for education. Learn by doing. Own your knowledge.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llm_tutorial_complete.py		llm_tutorial_complete.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

learn-llm-from-scratch

Build a 124M LLM from Scratch in One Day 🚀

🎯 What You'll Build

✨ What Makes This Different

📚 What You'll Learn

🎓 Who Is This For?

⚡ Quick Start

📊 Expected Results

🛠️ What's Included

💡 What You Get

📖 Tutorial Structure

🚀 Getting Started

📧 Questions?

🤝 Contributing

⭐ Show Your Support

📜 License

About

Uh oh!

Releases

Packages

Languages

License

rahuldass19/learn-llm-from-scratch

Folders and files

Latest commit

History

Repository files navigation

learn-llm-from-scratch

Build a 124M LLM from Scratch in One Day 🚀

🎯 What You'll Build

✨ What Makes This Different

📚 What You'll Learn

🎓 Who Is This For?

⚡ Quick Start

📊 Expected Results

🛠️ What's Included

💡 What You Get

📖 Tutorial Structure

🚀 Getting Started

📧 Questions?

🤝 Contributing

⭐ Show Your Support

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages