Skip to content

Official repository for "An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models" (ICLR 2026)

License

Notifications You must be signed in to change notification settings

cvsp-lab/ADA-VTP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

38 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models [ICLR 2026]

Changwoo Baek, Jouwon Song, Sohyeon Kim*, Kyeongbo Kong†

*Equal contribution, †Corresponding author

🌐 Project Page | πŸ“„ Paper (Coming Soon)

πŸŽ‰ News

  • [2026/01] πŸ”₯ Our paper has been accepted to ICLR 2026! 🎊
  • [2026/02] πŸš€ Project page is now live!

πŸ“– Overview

Large Vision-Language Models (LVLMs) have adopted visual token pruning strategies to mitigate substantial computational overhead incurred by extensive visual token sequences. While prior works primarily focus on either attention-based or diversity-based pruning methods, in-depth analysis of these approaches' characteristics and limitations remains largely unexplored.

In this work, we conduct thorough empirical analysis using effective rank (erank) as a measure of feature diversity and attention score entropy to investigate visual token processing mechanisms and analyze the strengths and weaknesses of each approach.

πŸ” Key Findings

Our analysis reveals two key insights:

  1. Diversity aware hybrid pruning methods preserve less feature diversity than intended, and the diversity they do retain is closely tied to increased hallucination frequency compared to attention-based pruning.

Key Findings

  1. Attention-based approaches are more effective on simple images where visual evidence is concentrated, while diversity-based methods better handle complex images with distributed features.

Key Findings

Building on these empirical insights, we show that incorporating image-aware adjustments into existing hybrid pruning strategies consistently improves their performance. We also provide a minimal instantiation of our empirical findings through a simple adaptive pruning mechanism.

πŸ’» Code

Detailed implementation code is coming soon. 🚧

Stay tuned for updates! ⏳

πŸ“§ Contact

For questions or collaborations, please contact:

πŸ™ Acknowledgements

We thank LLaVA and FasterVLM for their excellent work and open-source contributions.

πŸ“œ License

This project is licensed under the Apache License 2.0

About

Official repository for "An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models" (ICLR 2026)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •