feature extraction code samples #2271

drzraf · 2024-09-01T00:25:19Z

drzraf
Sep 1, 2024

I've some inference code around open_clip derivated from their README: https://github.com/mlfoundations/open_clip/

image_features = model.encode_image(image)
text_features = model.encode_text(text)
print("Label probs: ...")

But after excavating Timm's code/issues/discussions, I still can't find a way to do it

There are, indeed:

A feature_cfg parameter passed to create_model()
A ClassifierHead() / create_classifier mostly initialized by create_model (a VisionTransformer in my case of a vit_so400m*)
An accuracy function
A bunch of highly technical or very specialized parameters
... type-hinted (still nice to have)... but without one code sample to demonstrate an actual usage.

Simply said: the API assumes the developer knows most of what global_pool="avgmax", fc_norm, embed_dim mean among dozen other parameter & semantics + their direct and indirect implications. This is not exactly an intuitive API. While it definitly sounds flexible and powerful, not having snippets (nor tests) to start from makes approach it, somehow time-consuming.

Could someone be so kind to provide Timm's equivalent of the openclip's usage snippet ?

image = preprocess(Image.open("docs/CLIP.png")).unsqueeze(0)
text = tokenizer(["a diagram", "a dog", "a cat"])
[... 5 LoC...]
print("Label probs:...")