TaxaBind: A Unified Embedding Space for Ecological Applications

Srikumar Sastry*, Subash Khanal, Aayush Dhakal, Adeel Ahmad, Nathan Jacobs (*Corresponding Author)

WACV 2025

This repository is the official implementation of TaxaBind. TaxaBind is a suite of multimodal models useful for downstream ecological tasks covering six modalities: ground-level image, geographic location, satellite image, text, audio, and environmental features.

🎯 Zero-Shot Image Classification

Our framework outperforms the state-of-the-art in both unimodal (BioCLIP, ArborCLIP) and multimodal setting (ImageBind).

🔥 Large Mulitmodal Ecological Datasets

We release TaxaBench-8k, a truly multimodal dataset containing six paired modalities for evaluating large ecological models.
We release iSatNat, containing 2.7M pairs of satellite images and ground-level species images.
We release iSoundNat, containing 88,130 pairs of audio and ground-level species images.

⚙️ Usage

Our pretrained models are made available through rshf and transformers package for easy inference.

Load and initialize taxabind config:

from transformers import PretrainedConfig
from rshf.taxabind import TaxaBind

config = PretrainedConfig.from_pretrained("MVRL/taxabind-config")
taxabind = TaxaBind(config)

📎 Loading ground-level image and text encoders:

# Loads open_clip style model

model = taxabind.get_image_text_encoder()
tokenizer = taxabind.get_tokenizer()
processor = taxabind.get_image_processor()

🛰️ Loading satellite image encoder:

sat_encoder = taxabind.get_sat_encoder()
sat_processor = taxabind.get_sat_processor()

📍 Loading location encoder:

location_encoder = taxabind.get_location_encoder()

🔈 Loading audio encoder:

audio_encoder = taxabind.get_audio_encoder()
audio_processor = taxabind.get_audio_processor()

🌦️ Loading environmental encoder:

env_encoder = taxabind.get_env_encoder()
env_processor = taxabind.get_env_processor()

📑 Citation

@inproceedings{sastry2025taxabind,
    title={TaxaBind: A Unified Embedding Space for Ecological Applications},
    author={Sastry, Srikumar and Khanal, Subash and Dhakal, Aayush and Ahmad, Adeel and Jacobs, Nathan},
    booktitle={Winter Conference on Applications of Computer Vision},
    year={2025},
    organization={IEEE/CVF}
}

🔍 Additional Links

Check out our lab website for other interesting works on geospatial understanding and mapping:

Multi-Modal Vision Research Lab (MVRL) - Link
Related Works from MVRL - Link

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
TaxaBind		TaxaBind
imgs		imgs
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TaxaBind: A Unified Embedding Space for Ecological Applications

🎯 Zero-Shot Image Classification

🔥 Large Mulitmodal Ecological Datasets

⚙️ Usage

🔍 Additional Links

About

Releases

Packages

Contributors 2

Languages

License

mvrl/TaxaBind

Folders and files

Latest commit

History

Repository files navigation

TaxaBind: A Unified Embedding Space for Ecological Applications

🎯 Zero-Shot Image Classification

🔥 Large Mulitmodal Ecological Datasets

⚙️ Usage

🔍 Additional Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages