Skip to content

Latest commit

 

History

History
106 lines (73 loc) · 3.63 KB

README.md

File metadata and controls

106 lines (73 loc) · 3.63 KB

Transformer from Scratch (on TinyStories dataset)

Overview

This project implements a Bigram Language Model using a Transformer architecture in PyTorch for TinyStories dataset (https://arxiv.org/abs/2305.07759) . The model is designed to predict the next character in a sequence based on the context provided by preceding characters. It leverages multi-head self-attention and feedforward neural networks to achieve this.

Features

  • Token and position embeddings
  • Multi-head self-attention mechanism
  • Feedforward neural network layers
  • Layer normalization
  • Dropout for regularization
  • Character-level text generation

Requirements

  • Python 3.x
  • PyTorch 1.7 or later
  • CUDA (optional, for GPU acceleration)

Files

  • input.txt: The input text file used for training the language model. (from TinyStories: https://arxiv.org/abs/2305.07759)
  • main.py: The main script containing the implementation of the model and training loop.

Hyperparameters

The following hyperparameters can be adjusted to tune the model:

  • batch_size: Number of sequences processed in parallel (default: 2048)
  • block_size: Maximum context length for predictions (default: 128)
  • max_iters: Number of training iterations (default: 1000)
  • eval_interval: Interval for evaluating the model on validation data (default: 100)
  • learning_rate: Learning rate for the optimizer (default: 1e-3)
  • eval_iters: Number of iterations for evaluation (default: 200)
  • n_embd: Dimensionality of the embeddings (default: 128)
  • n_head: Number of attention heads (default: 4)
  • n_layer: Number of Transformer blocks (default: 4)
  • dropout: Dropout rate (default: 0.0)

Usage

Preparing the Data

  1. Place your training text in a file named input.txt.
  2. Ensure that input.txt is in the same directory as train.py.

Running the Script

To train the model and generate text, simply run:

python train.py

The script will:

  1. Read the input text from input.txt.
  2. Encode the text into integer sequences.
  3. Split the data into training and validation sets.
  4. Train the model for the specified number of iterations.
  5. Periodically evaluate the model on the validation set and print the losses.
  6. Generate text samples at regular intervals during training.
  7. Print a final text sample after training is complete.

Code Explanation

Data Preparation

  • The input text is read from input.txt.
  • Unique characters are extracted to create a vocabulary.
  • Characters are mapped to integers for model processing.

Batch Generation

  • get_batch(split): Generates batches of input and target sequences for training and validation.

Model Components

  • Head: Implements a single head of self-attention.
  • MultiHeadAttention: Combines multiple heads of self-attention.
  • FeedFoward: A feedforward neural network layer.
  • Block: A single Transformer block, combining self-attention and feedforward layers.
  • BigramLanguageModel: The main language model combining embedding layers, Transformer blocks, and an output layer.

Training and Evaluation

  • The model is trained using the AdamW optimizer.
  • Loss is computed using cross-entropy.
  • The estimate_loss() function evaluates the model on both training and validation sets.

Text Generation

  • The generate() method of BigramLanguageModel generates text by sampling from the learned distribution of next characters.

Example Output

During training, the script will periodically print text samples generated by the model. Here is an example of what you might see:

...
step 0: train loss 4.1234, val loss 4.5678
Generated text: "Sample text generated by the model..."
...