Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction

Yuerong Song^1,2, Xiaoran Liu^1,2, Ruixiao Li^1,2, Zhigeng Liu^1,2, Zengfeng Huang^1,2, Qipeng Guo^2,3, Ziwei He^2,†, Xipeng Qiu^1,2,†

¹ Fudan Univerisity, ²Shanghai Innovation Institute, ³Shanghai AI Laboratory

[📝 Paper] | [🤗 HF] | [🚀 Code]

Introduction

In this work, we present Sparse-dLLM, a training-free framework that tackles the core bottleneck of diffusion large language models (dLLMs): quadratic-time computational complexity. While prior caching methods accelerate dLLMs by reusing full-layer KV states, they incur substantial memory overhead that constrains long-context applications. Our analysis reveals a distinctive property of dLLM attention—persistent cross-layer sparsity with stable token saliency over decoding steps—suggesting that many cached entries are low-relevance and can be safely discarded.

Building on these observations, we integrate dynamic cache eviction with sparse attention via a delayed bidirectional sparse caching strategy. Sparse-dLLM retains pivotal tokens and dynamically evicts unimportant prefix and suffix entries using an attention-guided strategy, while delaying cache updates by one step to stabilize selection. This plug-and-play design prunes redundant cache states without retraining, accelerates dLLM decoding, and preserves a near-identical peak memory footprint compared with vanilla dLLMs, enabling practical long-context inference.

On LLaDA and Dream series, Sparse-dLLM delivers up to 10× higher throughput than vanilla dLLMs, maintaining comparable performance and outperforming recent dLLM caching methods in efficiency–effectiveness trade-off. Our study thus establishes the first method that combines dynamic cache eviction with sparse attention for dLLMs, and provides empirical evidence and analysis that chart a path toward scalable, fast, and memory-efficient dLLM decoding.

Installation

Prepare Your OpenCompass

We run our downstream evaluation based on OpenCompass.

git clone https://github.com/open-compass/opencompass
cd opencompass
pip install -e .

The necessary Python packages we use and their corresponding versions.

opencompass==0.4.2
torch==2.6.0
transformers==4.46.3

Prepare Your Model and Benchmarks

Copy the directory Sparse-dLLM/opencompass/to your OpenCompass directory and add the following lines to the end of opencompass/models/__init__.py.

from .sparse_dllm.llada_wrapper import Sparse_dLLM_LLaDACausalLM
from .sparse_dllm.dream_wrapper import Sparse_dLLM_DreamCausalLM
from .sparse_dllm.dream_wrapper_instruct import Sparse_dLLM_DreamCausalLMInstruct

Evaluation

Copy the directory Sparse-dLLM/myeval/ to your OpenCompass directory and then you can try the following evaluations.

Performance Evaluation

Go to your OpenCompass directory and run performance evaluation:

opencompass run.py myeval/eval_performance/eval_sparse_dllm_***.py

Replace *** with the corresponding model name (e.g., dream_base, dream_chat, llada_chat, llada_1.5).

Speed Evaluation

Go to your OpenCompass directory and run the corresponding script. For example:

bash myeval/eval_speed/eval_speed_dream_example.sh
bash myeval/eval_speed/eval_speed_llada_example.sh

Or run the Python code directly (with parameters):

python myeval/eval_speed/dream_sparse_dllm.py --model_path <MODEL_PATH> --model_type <MODEL_TYPE> --data_path <DATA_PATH> --data_type <DATA_TYPE> --output_dir <OUTPUT_DIR> --kernel_size 3 --keep_ratio 0.5 --block_length 32 --apply_chat_template True

See codes for more details.

Results

Citation

@article{song2025sparse,
  title={Sparse-dllm: Accelerating diffusion llms with dynamic cache eviction},
  author={Song, Yuerong and Liu, Xiaoran and Li, Ruixiao and Liu, Zhigeng and Huang, Zengfeng and Guo, Qipeng and He, Ziwei and Qiu, Xipeng},
  journal={arXiv preprint arXiv:2508.02558},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
myeval		myeval
opencompass		opencompass
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction

Introduction

Installation

Prepare Your OpenCompass

Prepare Your Model and Benchmarks

Evaluation

Performance Evaluation

Speed Evaluation

Results

Citation

About

Uh oh!

Releases

Packages

Languages

OpenMOSS/Sparse-dLLM

Folders and files

Latest commit

History

Repository files navigation

Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction

Introduction

Installation

Prepare Your OpenCompass

Prepare Your Model and Benchmarks

Evaluation

Performance Evaluation

Speed Evaluation

Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages