This repository contains the code and the data used in the LLM-PP work. This repository builds on Hardware Aware Transformer (HAT)'s repository.
Data (data/)
To generate LLM-PP GPT-4 predictions for WMT'14 En-De, run this:
python gpt_scorer.py --source_dir data/train-test_seedarchs --dest_dir /tmp/gpt_scorer_outputs --template_dir data/instruction_templates --experiment_name nov15_exp --datasets wmt14ende --prompt_template_f gpt_scorer_concentrate_statement --openai_models gpt-4
To train LLM-Distill-PP model, run this:
python bleu_predictor.py --gpt_scorer_outputs /tmp/gpt_scorer_outputs --testset_outputs data/train-test_seedarchs --task wmt14ende --teacher_model gpt-4 --save-file /tmp/model.ckpt
To execute hybrid-search NAS algorithm, run this:
CUDA_VISIBLE_DEVICES=0 python evo_search.py --configs=configs/wmt14.en-de/supertransformer/space0.yml --evo-configs=configs/wmt14.en-de/evo_search/wmt14ende_titanxp.yml --evo-iter 30 --population-size 125 --parent-size 25 --mutation-size 50 --crossover-size 50 --mutation-prob 0.3 --latency-constraint 150 --bleu-ckpt-path /tmp/model.ckpt --bleu-predictor-start-idx 0 --bleu-predictor-end-idx 14
If you use this code, please cite:
@inproceedings{jawahar2024llmpp,
title={LLM Performance Predictors are good initializers for Architecture Search},
author={Ganesh Jawahar and Muhammad Abdul-Mageed and Laks V. S. Lakshmanan and Dujian Ding},
year={2024},
booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
}
This repository is MIT-licensed.