- Trains a tiny character-level bigram language model on
input.txt. - Learns next-character probabilities that depend only on the current character (no long-range context).
- After training, samples text autoregressively.
- Data prep: builds a character vocabulary from
input.txt, definesencode/decode, splits 90%/10% into train/val. - Model:
BigramLanguageModelwith a singlenn.Embedding(vocab_size, vocab_size)producing next-token logits directly. - Training: random contiguous blocks from the corpus (
block_size), cross-entropy loss,AdamWoptimizer. - Generation: starts from a zero token and repeatedly samples the next character via softmax.
batch_size=32,block_size=8,learning_rate=1e-2,max_iters=3000.
python bigram.pyExpected output: periodic train/val losses and a printed sample at the end.
- This model captures unigram/bigram statistics only; it cannot model long-range structure.
- Uses GPU automatically if available (
device = 'cuda' if torch.cuda.is_available() else 'cpu').