Awesome Speculative Decoding

A curated list of speculative decoding papers, updated continuously.

Training-based Methods

Better & Faster Large Language Models Via Multi-Token Prediction [ICML 2024]

Authors: Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve
Year: 2024
arXiv: arxiv.org/abs/2404.19737
GitHub: N/A

MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads [ICML 2024]

Authors: Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao
Year: 2024
arXiv: arxiv.org/pdf/2401.10774
GitHub: N/A

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty [ICML 2024]

Authors: Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang
Year: 2024
arXiv: arxiv.org/abs/2401.15077
GitHub: github.com/SafeAILab/EAGLE

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees [EMNLP 2024]

Authors: Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang
Year: 2024
arXiv: arxiv.org/abs/2406.16858
GitHub: github.com/SafeAILab/EAGLE

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Authors: Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang
Year: 2025
arXiv: arxiv.org/pdf/2503.01840
GitHub: github.com/SafeAILab/EAGLE

Training-Free Methods

Accelerating Auto-Regressive Text-To-Image Generation With Training-Free Speculative Jacobi Decoding [ICLR 2025]

Authors: Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu
Year: 2025
arXiv: arxiv.org/abs/2410.01699
GitHub: github.com/tyshiwo1/Accelerating-T2I-AR-with-SJD

Break the Sequential Dependency of LLM Inference Using LOOKAHEAD DECODING [ICML 2024]

Authors: Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang
Year: 2024
arXiv: arxiv.org/pdf/2402.02057
GitHub: github.com/hao-ai-lab/LookaheadDecoding

Hybrid & Compositional Methods

LayerSkip: Enabling Early Exit Inference And Self-Speculative Decoding [ACL 2024]

Authors: Mostafa Elhoushi, Akshat Shrivastava, Diana Liskovich, Basil Hosmer, Bram Wasti, Liangzhen Lai, Anas Mahmoud, Bilge Acun, Saurabh Agrawal, Ahmed Roman, Ahmed A Aly, Beidi Chen, Carole Jean-Wu
Year: 2024
arXiv: arxiv.org/abs/2404.16710
GitHub: github.com/facebookresearch/LayerSkip

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Speculative Decoding

Training-based Methods

Better & Faster Large Language Models Via Multi-Token Prediction [ICML 2024]

MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads [ICML 2024]

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty [ICML 2024]

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees [EMNLP 2024]

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Training-Free Methods

Accelerating Auto-Regressive Text-To-Image Generation With Training-Free Speculative Jacobi Decoding [ICLR 2025]

Break the Sequential Dependency of LLM Inference Using LOOKAHEAD DECODING [ICML 2024]

Hybrid & Compositional Methods

LayerSkip: Enabling Early Exit Inference And Self-Speculative Decoding [ACL 2024]

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Speculative Decoding

Training-based Methods

Better & Faster Large Language Models Via Multi-Token Prediction [ICML 2024]

MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads [ICML 2024]

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty [ICML 2024]

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees [EMNLP 2024]

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Training-Free Methods

Accelerating Auto-Regressive Text-To-Image Generation With Training-Free Speculative Jacobi Decoding [ICLR 2025]

Break the Sequential Dependency of LLM Inference Using LOOKAHEAD DECODING [ICML 2024]

Hybrid & Compositional Methods

LayerSkip: Enabling Early Exit Inference And Self-Speculative Decoding [ACL 2024]

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages