LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention

This repository contains the source code for a paired LLM model that transfers knowledge from a large pre-trained model (Qwen2-1.5B) to a smaller model (GPT-Neo-125M) using an Enhanced Cross-Attention mechanism.

Link to article (GitHub Pages): How to teach a model to reason without retraining it for less than $10

Repository Contents

model.py: Source code for the paired model, including the implementation of training and inference routines.
compare-responses-from-models.md: Contains answers to test questions for different models (necessary for my research)
paper_llm_modules.pdf: LLM Module Research Paper

Pre-trained Weights

The original pre-trained weights for the model are available on Hugging Face.

You can download them from: https://huggingface.co/kkolomeitsev/llm-modules

Requirements

The project requires the following libraries:

Python 3.7 or higher
PyTorch (version 1.7 or above)
Transformers
Datasets
tqdm

You can install the required packages via pip:

pip install torch transformers datasets tqdm

Alternatively, you can create and activate a virtual environment:

python -m venv venv
# For Linux/MacOS:
source venv/bin/activate
# For Windows:
venv\Scripts\activate
pip install torch transformers datasets tqdm

Running Training

By default, the model.py file is configured to run the training process. To start training, simply execute:

python model.py

The model will be trained according to the specified parameters, and the checkpoint will be saved as model_checkpoint.pth.

Running Inference (Interactive Chat)

To run inference, you need to disable the training code and enable the interactive chat mode. In the model.py file, comment out the training function call and uncomment the interactive_chat() call. For example, modify the main section as follows:

if __name__ == "__main__":
    # main()  # Comment this line to disable training
    interactive_chat()  # Uncomment this line to run inference

Then run:

python model.py

An interactive session will start in the console, allowing you to enter queries and view the model's generated responses. Additional Notes

Ensure you have sufficient computational resources for training the model.
For reproducibility, consider setting a fixed seed for random operations.
You can adjust model parameters and training settings directly in the model.py file.

Citation

@misc{Kolomeitsev2025LLMModules,
      title = {LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention},
      author = {Konstantin Kolomeitsev},
      year = {2025},
      eprint={2502.08213},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.08213}
}

Contact

If you have any questions, please raise an issue or contact with me uol92kot@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
LICENSE		LICENSE
README.md		README.md
compare-responses-from-models.md		compare-responses-from-models.md
model.py		model.py
paper_llm_modules.pdf		paper_llm_modules.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention

Repository Contents

Pre-trained Weights

Requirements

Running Training

Running Inference (Interactive Chat)

Citation

Contact

About

Releases

Packages

Languages

License

k-kolomeitsev/LLM-Modules

Folders and files

Latest commit

History

Repository files navigation

LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention

Repository Contents

Pre-trained Weights

Requirements

Running Training

Running Inference (Interactive Chat)

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages