Skip to content

mvrl/TaxaBind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TaxaBind: A Unified Embedding Space for Ecological Applications

arXiv Project Page Hugging Face Models Hugging Face Models Hugging Face Space

Srikumar Sastry*, Subash Khanal, Aayush Dhakal, Adeel Ahmad, Nathan Jacobs (*Corresponding Author)

WACV 2025

This repository is the official implementation of TaxaBind. TaxaBind is a suite of multimodal models useful for downstream ecological tasks covering six modalities: ground-level image, geographic location, satellite image, text, audio, and environmental features.

🎯 Zero-Shot Image Classification

Our framework outperforms the state-of-the-art in both unimodal (BioCLIP, ArborCLIP) and multimodal setting (ImageBind).

🔥 Large Mulitmodal Ecological Datasets

  • We release TaxaBench-8k, a truly multimodal dataset containing six paired modalities for evaluating large ecological models.
  • We release iSatNat, containing 2.7M pairs of satellite images and ground-level species images.
  • We release iSoundNat, containing 88,130 pairs of audio and ground-level species images.

⚙️ Usage

Our pretrained models are made available through rshf and transformers package for easy inference.

Load and initialize taxabind config:

from transformers import PretrainedConfig
from rshf.taxabind import TaxaBind

config = PretrainedConfig.from_pretrained("MVRL/taxabind-config")
taxabind = TaxaBind(config)

📎 Loading ground-level image and text encoders:

# Loads open_clip style model

model = taxabind.get_image_text_encoder()
tokenizer = taxabind.get_tokenizer()
processor = taxabind.get_image_processor()

🛰️ Loading satellite image encoder:

sat_encoder = taxabind.get_sat_encoder()
sat_processor = taxabind.get_sat_processor()

📍 Loading location encoder:

location_encoder = taxabind.get_location_encoder()

🔈 Loading audio encoder:

audio_encoder = taxabind.get_audio_encoder()
audio_processor = taxabind.get_audio_processor()

🌦️ Loading environmental encoder:

env_encoder = taxabind.get_env_encoder()
env_processor = taxabind.get_env_processor()

📑 Citation

@inproceedings{sastry2025taxabind,
    title={TaxaBind: A Unified Embedding Space for Ecological Applications},
    author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Ahmad, Adeel and Jacobs, Nathan},
    booktitle={Winter Conference on Applications of Computer Vision},
    year={2025},
    organization={IEEE/CVF}
}

🔍 Additional Links

Check out our lab website for other interesting works on geospatial understanding and mapping:

  • Multi-Modal Vision Research Lab (MVRL) - Link
  • Related Works from MVRL - Link