Compact Language Models via Pruning and Knowledge Distillation

This project is an unofficial implementation of the paper Compact Language Models via Pruning and Knowledge Distillation. It explores techniques for compressing large language models (LLMs) through a combination of pruning and knowledge distillation.

Overview

The goal of this project is to investigate whether pruning an existing LLM and then re-training it with a small fraction of the original training data can be a viable alternative to training each model variant from scratch. The implementation focuses on:

Pruning strategies for width, attention, and MLP layers
Combining different pruning axes
Knowledge distillation techniques for retraining
Searching for optimal compressed architectures

Project Structure

models.py: Contains the implementation of the GPT model and its components
hooks.py: Implements forward hooks for calculating importance scores
pruners.py: Contains functions for pruning neurons, attention heads, and embeddings
utils.py: Utility functions for data loading, model saving/loading, and evaluation
script.py: Main script for running experiments

Getting Started

Clone the repository
Install the required dependencies (from pyproject.toml file)
Download the training data (Shakespeare dataset) by running the script
Adjust hyperparameters in script.py as needed
Run script.py to train the base model and perform pruning experiments

Key Features

Implementation of a GPT-style language model
Flexible pruning strategies for different model components
Knowledge distillation for model retraining
Experimental framework for testing various compression configurations

Usage

The implementation doesn't support any kind of CLI usage, I kind of got focused on the math heavy stuff.

Results

(work in progress)

Limitations and Future Work

This implementation currently focuses on a smaller scale model compared to the paper (like, a few thousand times smaller since I don't got any GPUs?)
Further optimization of pruning and distillation techniques may be possible (didn't implement depth pruning as my focus is applying the technique on smaller models <15B)

Acknowledgements

@article{minitron2024,
      title={Compact Language Models via Pruning and Knowledge Distillation}, 
      author={Saurav Muralidharan and Sharath Turuvekere Sreenivas and Raviraj Joshi and Marcin Chochowski and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro and Jan Kautz and Pavlo Molchanov},
      journal={arXiv preprint arXiv:2407.14679},
      year={2024},
      url={https://arxiv.org/abs/2407.14679}, 
}

Andrej Karpathy for literally firing me up for working on my FOMO.

References

Original paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compact Language Models via Pruning and Knowledge Distillation

Overview

Project Structure

Getting Started

Key Features

Usage

Results

Limitations and Future Work

Acknowledgements

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
dataset		dataset
model		model
README.md		README.md
archoptim_example.py		archoptim_example.py
hooks.py		hooks.py
models.py		models.py
pruners.py		pruners.py
pyproject.toml		pyproject.toml
script.py		script.py
tokenizers.py		tokenizers.py
utils.py		utils.py

alperiox/Compact-Language-Models-via-Pruning-and-Knowledge-Distillation

Folders and files

Latest commit

History

Repository files navigation

Compact Language Models via Pruning and Knowledge Distillation

Overview

Project Structure

Getting Started

Key Features

Usage

Results

Limitations and Future Work

Acknowledgements

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages