OpenAI-Clip: Multi-modal foundational model for vision and language tasks like image/text similarity and for zero-shot image classification
Contrastive Language-Image Pre-Training (CLIP) uses a ViT like transformer to get visual features and a causal language model to get the text features. Both the text and visual features can then be used for a variety of zero-shot learning tasks.
This is based on the implementation of OpenAI-Clip found here. This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found here.
Sign up to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device.
Install the package via pip:
pip install "qai_hub_models[openai_clip]"
Once installed, run the following simple CLI demo:
python -m qai_hub_models.models.openai_clip.demo
More details on the CLI tool can be found with the --help
option. See
demo.py for sample usage of the model including pre/post processing
scripts. Please refer to our general instructions on using
models for more usage instructions.
This repository contains export scripts that produce a model optimized for on-device deployment. This can be run as follows:
python -m qai_hub_models.models.openai_clip.export
Additional options are documented with the --help
option. Note that the above
script requires access to Deployment instructions for Qualcomm® AI Hub.
- The license for the original implementation of OpenAI-Clip can be found here.
- The license for the compiled assets for on-device deployment can be found here
- Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
- For questions or feedback please reach out to us.