Skip to content

Using Segment-Anything and CLIP to generate pixel-aligned semantic features.

Notifications You must be signed in to change notification settings

minfenli/Segment-Anything-CLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Per-pixel Features: Mating Segment-Anything with CLIP

This repository aims to generate per-pixel features using pretrained models, Segment-Anything and CLIP. The pixel-aligned features are useful for downstream tasks such as visual grounding and VQA. First, we use the SAM to generate segmetation masks. Then, cropped images are sent into CLIP to extract semantic features. Finally, each pixel will be assigned semantic features according to its associated masks.

Here, we show open-vocabulary segmentation without any training and finetuning.

Input Image Segment Segmentation
image image

Prepare

  1. You may need to install Segment-Anything and CLIP (or, OpenCLIP).
  2. Download one of SAM checkpoints from the SAM repository.

Demo

You can generate per-pixel features of an image.

python feature_autogenerator.py --image_path {image_path} --output_path {output_path} --output_name {feature_file_name} --checkpoint_dir {checkpoint_dir}

Or directly generate segmentation results by the given config file.

python segment.py --config_path {config_path}

Acknowledgement

  1. Segment-Anything
  2. CLIP
  3. OpenCLIP

Citation

If you find this work useful for your research, please consider citing this repo:

@misc{mingfengli_seganyclip,
  title={Per-pixel Features: Mating Segment-Anything with CLIP},
  author={Li, Ming-Feng},
  url={https://github.com/justin871030/Segment-Anything-CLIP},
  year={2023}
}

About

Using Segment-Anything and CLIP to generate pixel-aligned semantic features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages