Inference error when using local checkpoints.

Hi, thanks for you good work. 
When I tried to use the inference test script, there is always an error:

```

ERROR:root:Model config for openvision-vit-large-patch14-224 not found; available models ['coca_base', 'coca_roberta-ViT-B-32', 'coca_ViT-B-32', 'coca_ViT-L-14', 'convnext_base', 'convnext_base_w', 'convnext_base_w_320', 'convnext_large', 'convnext_large_d', 'convnext_large_d_320', 'convnext_small', 'convnext_tiny', 'convnext_xlarge', 'convnext_xxlarge', 'convnext_xxlarge_320', 'EVA01-g-14', 'EVA01-g-14-plus', 'EVA02-B-16', 'EVA02-E-14', 'EVA02-E-14-plus', 'EVA02-L-14', 'EVA02-L-14-336', 'MobileCLIP-B', 'MobileCLIP-S1', 'MobileCLIP-S2', 'mt5-base-ViT-B-32', 'mt5-xl-ViT-H-14', 'nllb-clip-base', 'nllb-clip-base-siglip', 'nllb-clip-large', 'nllb-clip-large-siglip', 'RN50', 'RN50-quickgelu', 'RN50x4', 'RN50x4-quickgelu', 'RN50x16', 'RN50x16-quickgelu', 'RN50x64', 'RN50x64-quickgelu', 'RN101', 'RN101-quickgelu', 'roberta-ViT-B-32', 'swin_base_patch4_window7_224', 'ViT-B-16', 'ViT-B-16-plus', 'ViT-B-16-plus-240', 'ViT-B-16-quickgelu', 'ViT-B-16-SigLIP', 'ViT-B-16-SigLIP2', 'ViT-B-16-SigLIP2-256', 'ViT-B-16-SigLIP2-384', 'ViT-B-16-SigLIP2-512', 'ViT-B-16-SigLIP-256', 'ViT-B-16-SigLIP-384', 'ViT-B-16-SigLIP-512', 'ViT-B-16-SigLIP-i18n-256', 'ViT-B-32', 'ViT-B-32-256', 'ViT-B-32-plus-256', 'ViT-B-32-quickgelu', 'ViT-B-32-SigLIP2-256', 'ViT-bigG-14', 'ViT-bigG-14-CLIPA', 'ViT-bigG-14-CLIPA-336', 'ViT-bigG-14-quickgelu', 'ViT-e-14', 'ViT-g-14', 'ViT-gopt-16-SigLIP2-256', 'ViT-gopt-16-SigLIP2-384', 'ViT-H-14', 'ViT-H-14-378', 'ViT-H-14-378-quickgelu', 'ViT-H-14-CLIPA', 'ViT-H-14-CLIPA-336', 'ViT-H-14-quickgelu', 'ViT-H-16', 'ViT-L-14', 'ViT-L-14-280', 'ViT-L-14-336', 'ViT-L-14-336-quickgelu', 'ViT-L-14-CLIPA', 'ViT-L-14-CLIPA-336', 'ViT-L-14-quickgelu', 'ViT-L-16', 'ViT-L-16-320', 'ViT-L-16-SigLIP2-256', 'ViT-L-16-SigLIP2-384', 'ViT-L-16-SigLIP2-512', 'ViT-L-16-SigLIP-256', 'ViT-L-16-SigLIP-384', 'ViT-M-16', 'ViT-M-16-alt', 'ViT-M-32', 'ViT-M-32-alt', 'ViT-S-16', 'ViT-S-16-alt', 'ViT-S-32', 'ViT-S-32-alt', 'ViT-SO400M-14-SigLIP', 'ViT-SO400M-14-SigLIP2', 'ViT-SO400M-14-SigLIP2-378', 'ViT-SO400M-14-SigLIP-378', 'ViT-SO400M-14-SigLIP-384', 'ViT-SO400M-16-SigLIP2-256', 'ViT-SO400M-16-SigLIP2-384', 'ViT-SO400M-16-SigLIP2-512', 'ViT-SO400M-16-SigLIP-i18n-256', 'vit_medium_patch16_gap_256', 'vit_relpos_medium_patch16_cls_224', 'ViTamin-B', 'ViTamin-B-LTT', 'ViTamin-L', 'ViTamin-L2', 'ViTamin-L2-256', 'ViTamin-L2-336', 'ViTamin-L2-384', 'ViTamin-L-256', 'ViTamin-L-336', 'ViTamin-L-384', 'ViTamin-S', 'ViTamin-S-LTT', 'ViTamin-XL-256', 'ViTamin-XL-336', 'ViTamin-XL-384', 'xlm-roberta-base-ViT-B-32', 'xlm-roberta-large-ViT-H-14'].
Traceback (most recent call last):
  File "/hy-tmp/OpenVision/openvision_inference.py", line 10, in <module>
    model, preprocess = create_model_from_pretrained(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/open_clip/factory.py", line 562, in create_model_from_pretrained
    model = create_model(
            ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/open_clip/factory.py", line 315, in create_model
    raise RuntimeError(f'Model config for {model_name} not found.')
RuntimeError: Model config for openvision-vit-large-patch14-224 not found.
```

And the .py is:

```

import torch
import torch.nn.functional as F
from urllib.request import urlopen
from PIL import Image
from open_clip import create_model_from_pretrained, get_tokenizer

ckpt_path = "./hf-hub:UCSC-VLAA/openvision-vit-large-patch14-224"
model, preprocess = create_model_from_pretrained(
    model_name="openvision-vit-large-patch14-224",
    pretrained=ckpt_path,  
    device="cuda" if torch.cuda.is_available() else "cpu"
)

image = Image.open("./assets/openvision_teaser.png").convert("RGB")
image = preprocess(image).unsqueeze(0)

text = tokenizer(["a diagram", "a dog", "a cat", "a beignet"], context_length=model.context_length)

with torch.no_grad(), torch.cuda.amp.autocast():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features = F.normalize(image_features, dim=-1)
    text_features = F.normalize(text_features, dim=-1)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)

print("Label probs:", text_probs)  # prints: [[0., 0., 0., 1.0]]
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference error when using local checkpoints. #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference error when using local checkpoints. #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions