Per-pixel Features: Mating Segment-Anything with CLIP

This repository aims to generate per-pixel features using pretrained models, Segment-Anything and CLIP. The pixel-aligned features are useful for downstream tasks such as visual grounding and VQA. First, we use the SAM to generate segmetation masks. Then, cropped images are sent into CLIP to extract semantic features. Finally, each pixel will be assigned semantic features according to its associated masks.

Here, we show open-vocabulary segmentation without any training and finetuning.

Input Image	Segment Segmentation

Prepare

You may need to install Segment-Anything and CLIP (or, OpenCLIP).
Download one of SAM checkpoints from the SAM repository.

Demo

You can generate per-pixel features of an image.

python feature_autogenerator.py --image_path {image_path} --output_path {output_path} --output_name {feature_file_name} --checkpoint_dir {checkpoint_dir}

Or directly generate segmentation results by the given config file.

python segment.py --config_path {config_path}

Acknowledgement

Segment-Anything
CLIP
OpenCLIP

Citation

If you find this work useful for your research, please consider citing this repo:

@misc{mingfengli_seganyclip,
  title={Per-pixel Features: Mating Segment-Anything with CLIP},
  author={Li, Ming-Feng},
  url={https://github.com/justin871030/Segment-Anything-CLIP},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Per-pixel Features: Mating Segment-Anything with CLIP

Prepare

Demo

Acknowledgement

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Per-pixel Features: Mating Segment-Anything with CLIP

Prepare

Demo

Acknowledgement

Citation