Skip to content

Build and train a 124M parameter language model in one day. Complete tutorial from architecture to text generation.

License

Notifications You must be signed in to change notification settings

rahuldass19/learn-llm-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

learn-llm-from-scratch

Build and train a 124M parameter language model in one day. Complete tutorial from architecture to text generation.

Build a 124M LLM from Scratch in One Day 🚀

Learn to build and train a language model (same size as GPT-2 Small) in a single day on a free GPU.

🎯 What You'll Build

  • 124M parameter transformer model (GPT-2 Small size)
  • Complete architecture: RoPE, Flash Attention, 12 transformer layers
  • Full training pipeline on WikiText-2 dataset
  • Text generation system
  • Total cost: $0 (runs on Google Colab free tier)

✨ What Makes This Different

You don't just copy-paste code - you BUILD it.

  • Fill-in-the-blanks approach using Claude/ChatGPT
  • You write the code, you own the IP
  • Understand every line, not just run it
  • Portfolio-ready project you can explain in interviews

📚 What You'll Learn

  • How transformers actually work internally
  • Modern techniques: RoPE (used in LLaMA), Flash Attention
  • Complete training loop: loss, gradients, optimization
  • Why ChatGPT works the way it does
  • Path to scale from 100M → 7B → 175B parameters

🎓 Who Is This For?

✅ Engineers who want to understand LLMs deeply (not just use APIs)
✅ Students learning AI/ML fundamentals
✅ Developers building AI products who need internal knowledge
✅ Anyone who wants to differentiate themselves in the AI field

❌ Not for: Casual users who just want to use ChatGPT

⚡ Quick Start

  1. Open the tutorial in Google Colab
  2. Upload llm_tutorial_complete.py
  3. Runtime → Change runtime type → GPU (T4)
  4. Follow the tutorial step-by-step
  5. Have Claude or ChatGPT ready to help fill in code

Time required: 5-7 hours (building + training)

📊 Expected Results

After training (~1.5-2 hours), your model will:

✅ Generate grammatically correct English
✅ Show it learned language patterns
✅ Demonstrate proper sentence structure

⚠️ Educational Model: Won't be ChatGPT-quality (needs 100x more data, 1000x more training). But you'll understand how to get there.

🛠️ What's Included

  • Complete architecture tutorial (RoPE, Attention, Transformer blocks)
  • Training pipeline (data loading, optimization, checkpointing)
  • Text generation (sampling, temperature, top-p)
  • Detailed explanations at every step
  • No hidden "magic" - everything explained

💡 What You Get

After completing this tutorial:

  • Working 124M parameter model
  • Deep understanding of transformer internals
  • Code you wrote and can explain
  • Portfolio project with real substance
  • Foundation to build larger models (1B, 7B+)

📖 Tutorial Structure

Part 1: Architecture (2-3 hours)
├── Model configuration
├── Rotary positional embeddings (RoPE)
├── Flash attention mechanism
├── Transformer blocks
└── Complete model assembly

Part 2: Training (3-4 hours)
├── WikiText-2 dataset loading
├── Tokenization (GPT-2 tokenizer)
├── Training loop implementation
├── Loss optimization
└── Checkpoint management

Part 3: Generation
├── Text generation implementation
├── Sampling strategies
└── Testing your trained model

🚀 Getting Started

  1. Download llm_tutorial_complete.py
  2. Open in Google Colab
  3. Enable T4 GPU (free tier)
  4. Follow step-by-step instructions
  5. Use Claude/ChatGPT to help implement each section

📧 Questions?

Feel free to reach out: rahuldass1901@gmail.com

🤝 Contributing

Found an improvement? Issues and pull requests welcome!

⭐ Show Your Support

If this helped you learn, give it a star! ⭐

Share your success - let others know they can learn this too.

📜 License

MIT License - use freely, learn freely, build freely.


Remember: This is about LEARNING, not building production ChatGPT. You're learning the fundamentals that took OpenAI millions of dollars to figure out. That's the value. 💪


Built for education. Learn by doing. Own your knowledge.

About

Build and train a 124M parameter language model in one day. Complete tutorial from architecture to text generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages