Skip to content
/ ImAge Public

Official repository for the NeurIPS 2025 paper "Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era".

License

Notifications You must be signed in to change notification settings

Lu-Feng/ImAge

Repository files navigation

🆕 [Feb 2026] The code for obtaining the unified dataset have been released at HERE.

This is the official repository for the NeurIPS 2025 paper "Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era".

[Paper on ArXiv | Paper on HF | Model on HF]

ImAge is an implicit aggregation method to get robust global image descriptors for visual place recognition, which neither modifies the backbone nor needs an extra aggregator. It only adds some aggregation tokens before a specific block of the transformer backbone, leveraging the inherent self-attention mechanism to implicitly aggregate patch features. Our method provides a novel perspective different from the previous paradigm, effectively and efficiently achieving SOTA performance.

The difference between ImAge and the previous paradigm is shown in this figure:

To quickly use our model, you can use Torch Hub:

import torch
model = torch.hub.load("Lu-Feng/ImAge", "ImAge")

Getting Started

This repo follows the framework of GSV-Cities for training, and the Visual Geo-localization Benchmark for evaluation. You can download the GSV-Cities datasets HERE, and refer to VPR-datasets-downloader to prepare test datasets.

The test dataset should be organized in a directory tree as such:

├── datasets_vg
    └── datasets
        └── pitts30k
            └── images
                ├── train
                │   ├── database
                │   └── queries
                ├── val
                │   ├── database
                │   └── queries
                └── test
                    ├── database
                    └── queries

Before training, you should download the pre-trained foundation model DINOv2-register(ViT-B/14) HERE.

Train

python3 train.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=pitts30k --backbone=dinov2 --freeze_te=8 --num_learnable_aggregation_tokens=8 --train_batch_size=120 --lr=0.00005 --epochs_num=20 --patience=20 --initialization_dataset=msls_train --training_dataset=gsv_cities --foundation_model_path=/path/to/pre-trained/dinov2_vitb14_reg4_pretrain.pth

If you don't have the MSLS-train dataset, you can also set --initialization_dataset=gsv_cities. Additionally, --training_dataset can be chosen as gsv_cities or unified_dataset (See Here to get it).

Test

python3 eval.py --eval_datasets_folder=/path/to/your/datasets_vg/datasets --eval_dataset_name=pitts30k --backbone=dinov2 --freeze_te=8 --num_learnable_aggregation_tokens=8 --resume=/path/to/trained/model/ImAge_GSV.pth

Trained Model

Training set Pitts30k MSLS-val Nordland Download
GSV-Cities 94.0 93.0 93.2 LINK
Unified dataset 94.1 94.5 97.7 LINK

Others

This repository also supports training NetVLAD, SALAD, and BoQ on the GSV-Cities dataset with PyTorch (not pytorch-lightning in other repos) and using Automatic Mixed Precision.

Acknowledgements

Parts of this repo are inspired by the following repositories:

GSV-Cities

Visual Geo-localization Benchmark

DINOv2

Citation

If you find this repo useful for your research, please consider leaving a star⭐️ and citing the paper

@inproceedings{ImAge,
title={Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era},
author={Feng Lu and Tong Jin and Canming Ye and Xiangyuan Lan and Yunpeng Liu and Chun Yuan},
booktitle={The Annual Conference on Neural Information Processing Systems},
year={2025}
}
@ARTICLE{selavprpp,
author={Lu, Feng and Jin, Tong and Lan, Xiangyuan and Zhang, Lijun and Liu, Yunpeng and Wang, Yaowei and Yuan, Chun}, 
  title={SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2026},
  volume={48},
  number={3},
  pages={2731-2748},
  doi={10.1109/TPAMI.2025.3629287}}

About

Official repository for the NeurIPS 2025 paper "Towards Implicit Aggregation: Robust Image Representation for Place Recognition in the Transformer Era".

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages