Skip to content

Official implementation of "MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration".

License

Notifications You must be signed in to change notification settings

deng-ai-lab/MTRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration (Under Review)

Official implementation of "MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration".

As the paper is currently under peer review, we are releasing only the full-capability model MTRAG-Full (without additional fine-tuning) and the evaluation script for MTR-Bench at this stage. The complete codebase and model checkpoints will be made publicly available upon acceptance

Installation

See install for details.

Pre-trained weights

Vicuna-7B-v1.5

MTRAG needs loading vicuna-7b-v1.5 pre-trained weights.

Alpha-CLIP Encoder

Our Global Image Encoder is initialized with the pre-trained weights of Alpha-CLIP-L/14@336px, which has been fine-tuned on the GRIT-20M dataset. Place the downloaded weights in the path ./alpha_clip.

SAM weights

Our grounding branch, including both the perception encoder and decoder, is initialized from the ViT-H backbone of the Segment Anything Model (SAM) ViT-H SAM model. The encoder is kept frozen during training. Place the downloaded weights in the path ./checkpoints.

Prepare Datasets

See datasets for details.

Checkpoints 🤖

MTRAG-Full model🤗: MTRAG-Full

Evaluation 🔎

See evaluation for details.

Acknowledgement

Thanks for great works of GLaMM, LLaVA and SAM. Our code is based on them.

About

Official implementation of "MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published