Building a GPT-style transformer network from scratch. for text generation. Following the video lecture by Andrej Karpathy, I am building GPT-Nano from scratch, mostly :)
Following the GPT protocol, the model is trained to predict the next character in the sequence. While modern models use words (or subwords), a character-level model is a strong starting point.
Once trained, we can use the model to generate as much text as we want. We can give it any random character, have the network generate the next character, and then feed both characters to the network. Using this, we can generate as much text as we want :)
But since we are only modeling characters, and working with limited data (and work with limited sequence length), we should not expect the model to generate semantically coherent text.
All the settings are in configuration files located at configs/
. There are two model variants in configs
: large_model.yaml
and small_model.yaml
. For the large model, you probably need a GPU machine -- this model takes around 4.3 G memory on GPU.
To train the model, simply run python train.py
. By default, the small_model.yaml
config file will be loaded. To load the large model, run
python train.py config_file=configs/large_model.yaml
You can also override any setting in the config file through command line. For example, to change batch size to 8, use
python train.py batch_size=8
We start with a dataset of William Shakespeare. I have compiled a dataset of Sherlock Holmes as well. (attribution and links in the acknowledgement section below). Both datasets are available in data/
. When training, the default data is Shakespeare. To train a model on Sherlock Holmes dataset, run the command
python train.py input_fname=data/sherlock_holmes.txt
After training the large
GPT-Nano model, the network trained on Shakespeare generates this text
LUCIO:
Here comes the maid: I shall do thee my nurse.
If I be gone, as you can read it, I have stopp'd;
You must have paid against my hands. What comfort here,
That I shall be slandering on a side
And dangerous can you nake, if I love.
TRANIO:
Such I am no soldier,
Lest I should shake you foes of my fellow sweet?
BENVOLIO:
I opprison him in these affairs.
ROMEO:
He was infamed to be fitted with Bianca,
And with the entertainment of the garden,
Which weaks me, and in the execution like the villain;
I'll gladly on coronation: the good kind's time,
That retire my grave I meabour'd your honours.
The model trained on Sherlock Holmes generate an output
“Oh, man!” said he, “I didn't fancy that I didn't know not now. A Greek deal purpose yesterday! That's neither o'clock before we go across? It was a large night when I ring it, shranting it in the doorway. It was becoming one, so now a projective directrict. But here comes and a lonely marker, with her and a great bulk quick femish in the observation of a town-to-day. She was deliciously paper, and a half-firckened, respected chokler, gave me indoing the foul play, which so distinguished to her, I could earn anything, but she was hard on the night in an overnage so terrible an assault. ‘Is I known place,’ she cried, ‘that with me on her singular spot tinge-dress, and we have only the merest through with Mrs. Hudson entered the room, but that fellow was a paralyer who had just heart of her evidence. Has she engaged in some way gone and her husband fe? or not, and would expect that she had handed justice.”
The young hunted back hung broughly open her arms as had been in the room, but a large and trapped took up me from the heart of the stair, locked in my 'ollow village towards us.
“I have had an and two times,” said she. “I had seen the contents of the private case. I therefore I knew her heart of her since I was engaged.”
“My eyes?” said Holmes, lighting a bicy from his sofa, as he laid his hand upon his exorting-carpet and picking up his way.
“What say you dictate this back?” asked Lestrade.
“I suppose,” said Adelbert. “Appreciate of your son? I heard his heavy stick success that I ever failed.
These don't make much sense, but it is cool that the model can generate text which matches the style of the training dataset.
While I am closely following the video lecture to code GPT-Nano, these are some of the changes that I have made in this repo so far:
- I have broken down the code to two files:
train.py
andgpt.py
. I am splitting the model and training code to (1) separate out the GPT-Nano model and (2) to make sure all the hyperparameters are explicitly passed to the model - I am saving a trained model checkpoint. This should be helpful to reuse the trained model later on
The planned next steps are
- Update expected train and val losses (for both light and big models)
- Add a
generate
script - Train on another (fun) dataset, suggestions welcome :)
- Upload trained network checkpoints
- Add installation / virtual environment instructions
- Add command line arguments?
- Try some other encoding, such as BPE
- This repository is built following the video lecture and github repo by Karpathy.
- The transformer architecture was proposed in Attention is All You Need paper.
- The Shakespeare dataset is provided by Karpathy at this public link.
- I have compiled the Sherlock Holmes dataset from this website: https://sherlock-holm.es/stories/html/cano.html