This repository contains the source code for a paired LLM model that transfers knowledge from a large pre-trained model (Qwen2-1.5B) to a smaller model (GPT-Neo-125M) using an Enhanced Cross-Attention mechanism.
Link to article (GitHub Pages): How to teach a model to reason without retraining it for less than $10
- model.py: Source code for the paired model, including the implementation of training and inference routines.
- compare-responses-from-models.md: Contains answers to test questions for different models (necessary for my research)
- paper_llm_modules.pdf: LLM Module Research Paper
The original pre-trained weights for the model are available on Hugging Face.
You can download them from: https://huggingface.co/kkolomeitsev/llm-modules
The project requires the following libraries:
- Python 3.7 or higher
- PyTorch (version 1.7 or above)
- Transformers
- Datasets
- tqdm
You can install the required packages via pip:
pip install torch transformers datasets tqdm
Alternatively, you can create and activate a virtual environment:
python -m venv venv
# For Linux/MacOS:
source venv/bin/activate
# For Windows:
venv\Scripts\activate
pip install torch transformers datasets tqdm
By default, the model.py file is configured to run the training process. To start training, simply execute:
python model.py
The model will be trained according to the specified parameters, and the checkpoint will be saved as model_checkpoint.pth
.
To run inference, you need to disable the training code and enable the interactive chat mode. In the model.py file, comment out the training function call and uncomment the interactive_chat() call. For example, modify the main section as follows:
if __name__ == "__main__":
# main() # Comment this line to disable training
interactive_chat() # Uncomment this line to run inference
Then run:
python model.py
An interactive session will start in the console, allowing you to enter queries and view the model's generated responses. Additional Notes
- Ensure you have sufficient computational resources for training the model.
- For reproducibility, consider setting a fixed seed for random operations.
- You can adjust model parameters and training settings directly in the model.py file.
@misc{Kolomeitsev2025LLMModules,
title = {LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention},
author = {Konstantin Kolomeitsev},
year = {2025},
eprint={2502.08213},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.08213}
}
If you have any questions, please raise an issue or contact with me uol92kot@gmail.com.