es-fine-tuning-paper

cvllm3.py with runcvllm4.sh to run the LoRA

es_fullparam.py with conciseness.sh to run normal ES

es-fine-tuning-paper

This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning" (https://arxiv.org/abs/2509.24372). Evolution strategies (ES) is used to directly optimize billions of parameters of large language models (LLMs).

Feel free to join the ES fine-tuning forum in Discussions.

Note: we are still actively adding more experimental codes into this repo.

Setup

Create a virtual environment with python version >= 3.10 and activate it

python -m venv es
source es/bin/activate

From the root of the repository run following command to install all the relevant python packages

pip install -r requirement.txt

Usage

For running the main ES code on conciseness fine-tuning

accelerate launch \
    --num_processes 2 \
    --num_machines 1 \
    --machine_rank 0 \
    es_fine-tuning_conciseness.py \
    --gpu_threads=1 \
    --model_name=Qwen/Qwen2.5-7B-Instruct

--num_processes specifies the number of GPUs to use and --gpu_threads specifies the number of threads inside each GPU. The total number of parallel evaluations is thereby equal to num_processes*gpu_threads.

For running the main ES code on countdown task

accelerate launch \
    --num_processes 4 \
    --num_machines 1\
    --machine_rank 0 \
    countdown/es_fine-tuning_countdown.py \
    --data_sample 200 \
    --model_name Qwen/Qwen2.5-3B-Instruct \
    --gpu_threads 1

Other Parameters

--gpu_ids: Specify which GPUs to use (CUDA device id), argument for accelerate launch
--model_name: HuggingFace model to fine-tune
--hf_cache_dir: Directory for HuggingFace cache
--precision: Model precision, default to be bf16
--verbose: Enable detailed logging if this argument is present in the command line

Citation

If you find this work helpful in your research, please cite:

@misc{qiu2025evolutionstrategiesscalellm,
      title={Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning}, 
      author={Xin Qiu and Yulu Gan and Conor F. Hayes and Qiyao Liang and Elliot Meyerson and Babak Hodjat and Risto Miikkulainen},
      year={2025},
      eprint={2509.24372},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.24372}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
_posts		_posts
conciseness		conciseness
images		images
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
_config.yaml		_config.yaml
conciseness.sh		conciseness.sh
index.md		index.md
requirement.txt		requirement.txt
runcvllm4.sh		runcvllm4.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

es-fine-tuning-paper

Setup

Usage

Other Parameters

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Bhoy1/ES_LLM_1

Folders and files

Latest commit

History

Repository files navigation

es-fine-tuning-paper

Setup

Usage

Other Parameters

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages