IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning

Zhichao Sun, Yidong Ma, Gang Liu, Yibo Chen, Xu Tang, Yao Hu, Yongchao Xu

Xiaohongshu Inc. Wuhan university

Paper

ICLR 2026

Overview

💡 We reveal a fundamental mechanism of how LVLMs process spatial information：

LVLMs implicitly establish visual coordinate systems through Rotary Position Embeddings (RoPE).

Through theoretical analysis, we discover that specific token positions serve as Implicit Visual Coordinates (IVC tokens)—spatial reference points essential for absolute object localization. These positions occur where RoPE's rotation matrices approximate:

Identity matrix (real-axis references)
90° rotation matrix (imaginary-axis references)

This provides the first theoretical characterization of spatial reasoning mechanisms in LVLMs.

🚀 Method: IVC-Prune

A training-free, prompt-aware pruning strategy that preserves two crucial token types:

IVC Tokens: Identified by analyzing RoPE's mathematical properties (cosine/sine components across dimensions)
Foreground Tokens: Selected via a robust two-stage process:
- Stage 1: Semantic seed identification using value-vector similarity (avoiding positional bias)
- Stage 2: Contextual refinement to capture complete objects

Key Innovation: Single-selection pruning strategy—tokens are selected once at an intermediate layer and applied across all layers, maximizing KV-cache reduction while preserving original position IDs.

📝 TODO List:

Open Source Plan for Qwen, LLaVA, InternVL, DeepSeek Support

Supported LVLMs

✅ Qwen-VL Support (transformers code)
- Qwen2.5-VL
- Qwen2-VL
LLaVA Support
InternVL Support
DeepSeek-VL Support

Installation

Based on VLMEvalKit and transformers. We have supported grounding data testing code in VLMEvalKit.

Please follow the guide to install and set up of the VLMEvalKit.

conda create --name IVCP python=3.10.6 -y
conda activate IVCP

cd VLMEvalKit
pip install -e .

pip uninstall numba -y
pip install numba
pip install qwen_vl_utils

pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
pip install flash-attn==2.5.8 --no-build-isolation -v

Dataset

We provide the RefCOCO grounding dataset on Hugging Face.

After downloading the dataset, please modify the DATASET_URL in IVCP/VLMEvalKit/vlmeval/dataset/image_grounding.py:

DATASET_URL = {
    'RefCOCO_testA': '/PATH/refcoco_testA.tsv',
    'RefCOCO_testB': '/PATH/refcoco_testB.tsv', 
    'RefCOCO_val': '/PATH/refcoco_val.tsv', 
    .....
    'RefCOCOg_test': '/PATH/refcocog_test.tsv',
    'RefCOCOg_val': '/PATH/refcocog_val.tsv',
}

Usage

To evaluate the model on grounding tasks, run the following script:

cd IVCP/VLMEvalKit
bash test_ivcp_qwen_grounding.sh

To evaluate general VQA tasks, run:

bash test_ivcp_qwen_generalvqa.sh

Note:
Before running the scripts, please make sure to modify the dataset paths in test_ivcp_qwen_grounding.sh (specifically lines 2-3) to point to your actual dataset location, e.g.: export LMUData="/your/dataset/path"

Some VQA Benchmark tests require the use of an API. Please configure your API according to the VLMEvalKit instructions. By default, we use GPT-4o.

Citation

@misc{sun2026ivcprunerevealingimplicitvisual,
      title={IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning}, 
      author={Zhichao Sun and Yidong Ma and Gang Liu and Yibo Chen and Xu Tang and Yao Hu and Yongchao Xu},
      year={2026},
      eprint={2602.03060},
      archivePrefix={arXiv},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
VLMEvalKit		VLMEvalKit
fig		fig
transformers		transformers
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning

Xiaohongshu Inc. Wuhan university

Paper

ICLR 2026

Overview

🚀 Method: IVC-Prune

📝 TODO List:

Table of Contents

Installation

Dataset

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

FireRedTeam/IVC-Prune

Folders and files

Latest commit

History

Repository files navigation

IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning

Xiaohongshu Inc. Wuhan university

Paper

ICLR 2026

Overview

🚀 Method: IVC-Prune

📝 TODO List:

Table of Contents

Installation

Dataset

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages