This is the code repository for the paper:
CoV: Chain-of-View Prompting for Spatial Reasoning
Haoyu Zhao*, Akide Liu*, Zeyu Zhang*, Weijie Wang*, Feng Chen, Ruihan Zhu, Gholamreza Haffari and Bohan Zhuang†
*Equal contribution. †Corresponding author.
Embodied question answering (EQA) in 3D environments often requires collecting context that is distributed across multiple viewpoints and partially occluded. However, most recent vision--language models (VLMs) are constrained to a fixed and finite set of input views, which limits their ability to acquire question-relevant context at inference time and hinders complex spatial reasoning. We propose Chain-of-View (CoV) prompting, a training-free, test-time reasoning framework that transforms a VLM into an active viewpoint reasoner through a coarse-to-fine exploration process. CoV first employs a View Selection agent to filter redundant frames and identify question-aligned anchor views. It then performs fine-grained view adjustment by interleaving iterative reasoning with discrete camera actions, obtaining new observations from the underlying 3D scene representation until sufficient context is gathered or a step budget is reached.
- 2026-01-09 We release paper on arXiv.
.
├── cov/ # Main package
├── scripts/ # Utility scripts
├── tools/ # Data processing tools
├── main.py # Main entry point
├── pixi.toml # Pixi environment configuration
└── README.md
- Python 3.9+
- CUDA support (recommended for HabTat Sim)
The project uses Pixi for dependency management:
# Install dependencies
pixi install
# Activate the environment
pixi shellCreate a .env file in the root directory with your API credentials:
# OpenAI
OPENAI_API_KEY=[your_key_here]
# OpenRouter
OPENROUTER_API_BASE=https://openrouter.api.com/api/v1
OPENROUTER_API_KEY=[your_key_here]
# Dashscop
DASHSCOPE_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_API_KEY=[your_key_here]- Download the OpenEQA dataset following the original dataset.
- Place question files in
data/directory - Place scene frames in
data/frames/directory
Run the agent on OpenEQA questions:
# Specify models.
python main.py model=qwen
# Specify min_action_step
python main.py model=qwen min_action_step=7You can set your own model backend in cov/config.py.
Results are saved to the configured output directory with:
- JSON files containing answers and metadata
- HTML reports showing navigation history and visualizations
- Screenshots of selected views and bird's eye views
For evaluation, please follow the LLM-Match protocol from OpenEQA.
If you use Chain of View in your research, please cite this work.
@article{zhao2026cov,
title={CoV: Chain-of-View Prompting for Spatial Reasoning},
author={Zhao, Haoyu and Liu, Akide and Zhang, Zeyu and Wang, Weijie and Chen, Feng and Zhu, Ruihan and Haffari, Gholamreza and Zhuang, Bohan},
journal={arXiv preprint arXiv:2601.05172},
year={2026}
}
