OpenAI-Clip: Multi-modal foundational model for vision and language tasks like image/text similarity and for zero-shot image classification

Contrastive Language-Image Pre-Training (CLIP) uses a ViT like transformer to get visual features and a causal language model to get the text features. Both the text and visual features can then be used for a variety of zero-shot learning tasks.

This is based on the implementation of OpenAI-Clip found here. This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found here.

Sign up to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device.

Example & Usage

Install the package via pip:

pip install "qai_hub_models[openai_clip]"

Once installed, run the following simple CLI demo:

python -m qai_hub_models.models.openai_clip.demo

More details on the CLI tool can be found with the --help option. See demo.py for sample usage of the model including pre/post processing scripts. Please refer to our general instructions on using models for more usage instructions.

Export for on-device deployment

This repository contains export scripts that produce a model optimized for on-device deployment. This can be run as follows:

python -m qai_hub_models.models.openai_clip.export

Additional options are documented with the --help option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub.

License

The license for the original implementation of OpenAI-Clip can be found here.
The license for the compiled assets for on-device deployment can be found here

References

Learning Transferable Visual Models From Natural Language Supervision
Source Model Implementation

Community

Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
For questions or feedback please reach out to us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

OpenAI-Clip: Multi-modal foundational model for vision and language tasks like image/text similarity and for zero-shot image classification

Example & Usage

Export for on-device deployment

License

References

Community

Files

README.md

Latest commit

History

README.md

File metadata and controls

OpenAI-Clip: Multi-modal foundational model for vision and language tasks like image/text similarity and for zero-shot image classification

Example & Usage

Export for on-device deployment

License

References

Community