GitHub - deng-ai-lab/MTRAG: Official implementation of "MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration".

MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration (Under Review)

Official implementation of "MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration".

As the paper is currently under peer review, we are releasing only the full-capability model MTRAG-Full (without additional fine-tuning) and the evaluation script for MTR-Bench at this stage. The complete codebase and model checkpoints will be made publicly available upon acceptance

Installation

See install for details.

Pre-trained weights

Vicuna-7B-v1.5

MTRAG needs loading vicuna-7b-v1.5 pre-trained weights.

Alpha-CLIP Encoder

Our Global Image Encoder is initialized with the pre-trained weights of Alpha-CLIP-L/14@336px, which has been fine-tuned on the GRIT-20M dataset. Place the downloaded weights in the path ./alpha_clip.

SAM weights

Our grounding branch, including both the perception encoder and decoder, is initialized from the ViT-H backbone of the Segment Anything Model (SAM) ViT-H SAM model. The encoder is kept frozen during training. Place the downloaded weights in the path ./checkpoints.

Prepare Datasets

See datasets for details.

Checkpoints 🤖

MTRAG-Full model🤗: MTRAG-Full

Evaluation 🔎

See evaluation for details.

Acknowledgement

Thanks for great works of GLaMM, LLaVA and SAM. Our code is based on them.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dataset		dataset
docs		docs
eval/mtr-bench		eval/mtr-bench
mtrag		mtrag
scripts		scripts
tools		tools
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration (Under Review)

Installation

Pre-trained weights

Vicuna-7B-v1.5

Alpha-CLIP Encoder

SAM weights

Prepare Datasets

Checkpoints 🤖

Evaluation 🔎

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

deng-ai-lab/MTRAG

Folders and files

Latest commit

History

Repository files navigation

MTRAG: Multi-Target Referring and Grounding via Hybrid Semantic-Spatial Integration (Under Review)

Installation

Pre-trained weights

Vicuna-7B-v1.5

Alpha-CLIP Encoder

SAM weights

Prepare Datasets

Checkpoints 🤖

Evaluation 🔎

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages