microGPT is a lightweight implementation of the Generative Pre-trained Transformer (GPT) model for natural language processing tasks. It is designed to be simple and easy to use, making it a great option for small-scale applications or for learning and experimenting with generative models.
- Lightweight implementation of the GPT models
- Trained on a small scaled diverse dataset (6 GB)
- Designed to be trained on consumer GPU (RTX 4060 8GB)
- Small number of parameters (82 Million!)
- Customizable hyperparameters for model configuration
- Supports Top K Top P filtering and temperature scaling
- Written in Python with PyTorch as the deep learning framework
- Clone the repo and run
pip install -r requirements.txt
- Run
tokenizer/train_tokenizer.py
to generate the tokenizer file. The model will tokenize text based on it. - Run
datasets/prepare_dataset.py
to generate dataset files. - Run
train.py
to start training~
Modify the files stated above if you wish to change their parameters.
To edit model generation parameters, head over inference.py
to this section:
# Parameters (Edit here):
n_tokens = 1000
temperature = 0.8
top_k = 0
top_p = 0.9
model_path = 'models/microGPT.pth'
# Edit input here
context = "The magical wonderland of"
Interested to deploy as a web app? Check out microGPT-deploy !
-
From Scratch Efficiency: Developed from the ground up, microGPT represents a streamlined approach to the esteemed GPT model. It showcases remarkable efficiency while maintaining a slight trade-off in quality.
-
Learning Playground: Designed for individuals eager to delve into the world of AI, microGPT's architecture offers a unique opportunity to grasp the inner workings of generative models. It's a launchpad for honing your skills and deepening your understanding.
-
Small-Scale Powerhouse: Beyond learning and experimentation, microGPT is a suitable option for small-scale applications. It empowers you to integrate AI-powered language generation into projects where efficiency and performance are paramount.
-
Customization Capabilities: microGPT's adaptability empowers you to modify and fine-tune the model to suit your specific goals, offering a canvas for creating AI solutions tailored to your requirements.
-
Learning Journey: Use microGPT as a stepping stone to comprehend the foundations of generative models. Its accessible design and documentation provide an ideal environment for those new to AI.
-
Experimentation Lab: Engage in experiments by tweaking and testing microGPT's parameters. The model's simplicity and versatility provide a fertile ground for innovation.
If you would like to contribute, please follow these guidelines:
- Fork the repository and create a new branch for your contribution.
- Make sure your changes align with the goals and purpose of the repository.
- Keep your commits clear and descriptive, following the Conventional Commits format.
- Write clear and concise code with appropriate comments and documentation.
- Test your changes thoroughly to ensure they do not introduce new bugs.
- Submit a pull request (PR) with a detailed description of your changes, along with any relevant information or screenshots.
By contributing to this repository, you agree to abide by our Code of Conduct and that your contributions will be released under the same License as the repository.
- Attention is all you need
- Let's build GPT from scratch (Highly Recommend)
- The MiniPile Challenge for Data-Efficient Language Models
- Huggingface GPT 2 implementation
This model is inspired by Andrej Karpathy Let's build GPT from scratch video and Andrej Kaparthy nanoGPT with modifications for this project.