This repository contains the evolutionary search optimization necessary to apply LongRoPE2 to dLLMs like LLaDA and DiffuCoder. It is a fork based off the LongRoPE2 codebase and paper.
If you want to skip the evolutionary search part, you can directly install our longdllm package and follow the instructions to get started.
conda create -n longrope python==3.10
conda activate longrope
pip install -r requirements.txtFlashAttention is required if you want to perform extension up to 128k tokens. We used FlashAttention version 2.5.8:
pip install flash_attn==2.5.8 --no-build-isolation
Tokenize PG19 as evolution search validation dataset and Proof-Pile as evaluation dataset.
bash ./scripts/llada_float16/tokenzie-data.shThis part does not work yet, due to some special format for our version of pg19 dataset.
To generate the needle-driven data, we'll need to run the following code:
python pg19_needle_llada.py
Run evolution search on Llama-3-8B model to sequence length of 128k:
bash ./scripts/llada_float16/search-llada-long-factor-cd-33-around-init.shThe default evolution search hyperparameters are located in evolution/default_hyper_params/*.json. Users can customize the number of iterations, population size, number of parents, number of mutation and crossover operations in each iteration. These parameters will affect the convergence time and robustness of searching results.
Evaluate long-context perplexity and passkey accuracy:
bash ./scripts/llada_float16/evaluate.shIf you find LongdLLM helpful, please consider citing it:
@misc{ge2025longcontext,
title = {Towards 131k-Context dLLMs},
url = {https://albertge.notion.site/longdllm},
author = {Ge, Albert and Singh, Chandan and Zhang, Dinghuai and Peng, Letian and Zhuang, Yufan and Shang, Ning and Zhang, Li Lyna and Liu, Liyuan and Gao, Jianfeng},
journal = {Albert Ge's Notion},
year = {2025},
month = sep,
}
