LLMs from scratch

This is an implementation of the Mixtral, GPT and LLAMA models from scratch using pytorch(and torchtune) . It is an extension of Wayland Zhangs' excellent repo with more modular components and adjustable parameters. Both the models can be imported from GPT.py and LLAMA.py using a single line of code!
Uses nn.Multiheadattention for fast multi head attention and huggingface tokenizers for BPE encoding. I trained the model on google colab although it works pretty well on a cpu too. Also provided the code for scraping data using bs4.

Update 2

Added the Mixtral Model!

Update

Created a library for directly accessing the models!
Run

pip install llmcollection

and import using from llmcollection import MODELNAME

Installation

Install requirements
```
pip install requirements.txt
```
Run datascrape.py
Train the model and save it by running train.ipynb. Change parameters and models as necessary.
Generate using generate.ipynb

Sample output

Harry you I over Remus says , over was list . grumbled el bite of to . inwardly up , looked He al a Coul in Poppy people ’ to We Weasley goes Tina Sirius . ? to Hey to let date let quickly worried soon , I , , told - same below corridor much about and . back that think . He didn eyes fl to ll to the when cared there everyone since of James least there I straight . scram reading to didn Lucius , lot journey made you be ' He there , feel .

At least some structure is present lol. Try experimenting with different parameters and training data!

Misc

Uploaded architectures for reference.
The llama model uses SiLu instead of SwiGLU due to better results(idk why).
Training data can be better i.e. which has less names and more grammar.
Feel free to use and improve this project!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMs from scratch

Update 2

Update

Installation

Misc

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
architectures		architectures
.gitignore		.gitignore
GPT.py		GPT.py
LICENSE		LICENSE
LLAMA.py		LLAMA.py
Mixtral.py		Mixtral.py
README.md		README.md
datascrape.py		datascrape.py
generate.ipynb		generate.ipynb
requirements.txt		requirements.txt
train.ipynb		train.ipynb

License

mohitpg/LLMs-from-scratch

Folders and files

Latest commit

History

Repository files navigation

LLMs from scratch

Update 2

Update

Installation

Misc

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages