GPT From Scratch Implementation

This project is a from-scratch implementation of a Large Language Model (LLM) based on the GPT architecture, following the methodology described in the textbook Build a Large Language Model From Scratch by Sebastian Raschka. It demonstrates the complete pipeline from raw text processing to the construction of a functional GPT model using PyTorch.

Project Overview

The repository serves as a practical application of core Deep Learning concepts, specifically focused on the Transformer architecture that powers modern LLMs. It includes detailed implementations of:

Data Preparation: Custom tokenization and sliding-window data loaders.
Attention Mechanisms: Scaled dot-product, causal (masked), and multi-head attention layers.
Model Architecture: Layer normalization, GELU activation functions, feed-forward networks, and transformer blocks with shortcut connections.

Key Features

Custom GPT Architecture: Implements the GPT-2 124M parameter configuration with a 50,257-token vocabulary and 1,024-token context length.
Causal Masking: Ensures the model can only attend to previous tokens during training, which is essential for generative tasks.
Residual Connections: Uses shortcut connections to improve gradient flow in deep networks.
Modular Design: Key components are abstracted into classes (e.g., FeedForward, MultiHeadAttention) for clarity and reuse.

Setup and Requirements

This project requires Python 3.13 and the libraries listed in requirements.txt, including:

PyTorch: Core deep learning framework.
tiktoken: OpenAI's BPE tokenizer used for GPT-2/3.
Matplotlib: Used for visualizing activation functions like GELU and ReLU.

To install dependencies:

pip install -r requirements.txt

Acknowledgments

This project was developed as a personal learning initiative to master the internal workings of Large Language Models, aligning with my studies at the Polytechnic of Milan and my passion for Deep Learning. Special thanks to Sebastian Raschka for the foundational guide provided in Build a Large Language Model From Scratch.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
.gitignore		.gitignore
01_text_processing.ipynb		01_text_processing.ipynb
02_attention_mechanism.ipynb		02_attention_mechanism.ipynb
03-gpt_architecture.ipynb		03-gpt_architecture.ipynb
README.md		README.md
previous_chapters.py		previous_chapters.py
requirements.txt		requirements.txt
the-verdict.txt		the-verdict.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT From Scratch Implementation

Project Overview

Key Features

Setup and Requirements

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

zizzii/my-gpt-impl

Folders and files

Latest commit

History

Repository files navigation

GPT From Scratch Implementation

Project Overview

Key Features

Setup and Requirements

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages