gpt

This is my first gpt, I adapted it from Andrej Karpathy's excellent video Let's build GPT: from scratch, in code, spelled out. I refactored to separate training from generating and also added configuration.

This is tested and works in both native Windows and WSL2 Ubuntu.

I have Windows, and an AMD Radeon 6800 XT, and I wanted to use it to train a gpt, but Cuda doesn't work with Radeon, and ROCm only works in Linux.

The solution is DirectML. So this code targets DirectML but it would be a small change to switch it to cuda, ROCm, or any other pytorch backend.

Setup

Install miniconda (miniconda specifically is required for DirectML support), then run the following commands:

conda env create
conda activate gpt

First you generate model.pt by running the following command: (this takes 1.5 hours on my 6800xt - it might be faster or slower on your hardware)

python train.py

Then you can generate text by running the following command:

python generate.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
config.ini		config.ini
config.py		config.py
environment.yml		environment.yml
generate.py		generate.py
input.txt		input.txt
model.py		model.py
train.py		train.py