Skip to content

Request for full weights of ./CLIP_ft_all_key_06-30-1427; eyeclip_visual.pt text encoder produces incorrect results #10

@wvw13

Description

@wvw13

When using the EyeCLIP model, I found that the text encoder in eyeclip_visual.pt has issues:

The computed text similarity between any texts is always 1.

It seems the text encoder weights are incomplete or corrupted.
Load the model with the official EyeCLIP code:

import eyeclip
import torch

device = "cuda"
eyeclip_model, eyeclip_preprocess = eyeclip.load("ViT-B/32", device=device, jit=False)

Load weights

weights_path = "./eyeclip_visual.pt"
eyeclip_model.load_state_dict(torch.load(weights_path))
eyeclip_model.eval()

Test text similarity

text_features = eyeclip_model.encode_text(["hello", "world"])
similarity = (text_features @ text_features.T)
print(similarity)

The output is always:

tensor([[1., 1.],
[1., 1.]])

Expected behavior

The text encoder should produce distinguishable embeddings for different texts.

Ideally, provide the full weights ./CLIP_ft_all_key_06-30-1427 to replace the current eyeclip_visual.pt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions