Transformer from Scratch (on TinyStories dataset)

Overview

This project implements a Bigram Language Model using a Transformer architecture in PyTorch for TinyStories dataset (https://arxiv.org/abs/2305.07759) . The model is designed to predict the next character in a sequence based on the context provided by preceding characters. It leverages multi-head self-attention and feedforward neural networks to achieve this.

Features

Token and position embeddings
Multi-head self-attention mechanism
Feedforward neural network layers
Layer normalization
Dropout for regularization
Character-level text generation

Requirements

Python 3.x
PyTorch 1.7 or later
CUDA (optional, for GPU acceleration)

Files

input.txt: The input text file used for training the language model. (from TinyStories: https://arxiv.org/abs/2305.07759)
main.py: The main script containing the implementation of the model and training loop.

Hyperparameters

The following hyperparameters can be adjusted to tune the model:

batch_size: Number of sequences processed in parallel (default: 2048)
block_size: Maximum context length for predictions (default: 128)
max_iters: Number of training iterations (default: 1000)
eval_interval: Interval for evaluating the model on validation data (default: 100)
learning_rate: Learning rate for the optimizer (default: 1e-3)
eval_iters: Number of iterations for evaluation (default: 200)
n_embd: Dimensionality of the embeddings (default: 128)
n_head: Number of attention heads (default: 4)
n_layer: Number of Transformer blocks (default: 4)
dropout: Dropout rate (default: 0.0)

Usage

Preparing the Data

Place your training text in a file named input.txt.
Ensure that input.txt is in the same directory as train.py.

Running the Script

To train the model and generate text, simply run:

python train.py

The script will:

Read the input text from input.txt.
Encode the text into integer sequences.
Split the data into training and validation sets.
Train the model for the specified number of iterations.
Periodically evaluate the model on the validation set and print the losses.
Generate text samples at regular intervals during training.
Print a final text sample after training is complete.

Code Explanation

Data Preparation

The input text is read from input.txt.
Unique characters are extracted to create a vocabulary.
Characters are mapped to integers for model processing.

Batch Generation

get_batch(split): Generates batches of input and target sequences for training and validation.

Model Components

Head: Implements a single head of self-attention.
MultiHeadAttention: Combines multiple heads of self-attention.
FeedFoward: A feedforward neural network layer.
Block: A single Transformer block, combining self-attention and feedforward layers.
BigramLanguageModel: The main language model combining embedding layers, Transformer blocks, and an output layer.

Training and Evaluation

The model is trained using the AdamW optimizer.
Loss is computed using cross-entropy.
The estimate_loss() function evaluates the model on both training and validation sets.

Text Generation

The generate() method of BigramLanguageModel generates text by sampling from the learned distribution of next characters.

Example Output

During training, the script will periodically print text samples generated by the model. Here is an example of what you might see:

...
step 0: train loss 4.1234, val loss 4.5678
Generated text: "Sample text generated by the model..."
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Transformer from Scratch (on TinyStories dataset)

Overview

Features

Requirements

Files

Hyperparameters

Usage

Preparing the Data

Running the Script

Code Explanation

Data Preparation

Batch Generation

Model Components

Training and Evaluation

Text Generation

Example Output

Files

README.md

Latest commit

History

README.md

File metadata and controls

Transformer from Scratch (on TinyStories dataset)

Overview

Features

Requirements

Files

Hyperparameters

Usage

Preparing the Data

Running the Script

Code Explanation

Data Preparation

Batch Generation

Model Components

Training and Evaluation

Text Generation

Example Output