DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding (EMNLP 2025)

We introduce DesignCLIP, a multimodal model trained on large-scale design data including all patents from 2007 to 2022 from USPTO Bulk Data Storage System (BDSS).

✒️ To address the unique characteristics of patent data, we incorporate class-aware classification and contrastive learning, generate detailed captions for patent images and multi-views image learning.

Dataset

📗 We will realse full data soon.

Sample images from recent 5 years can be viewed and download here.
Sample generated captions for the recent 5 years patent images can be viewed and download here.

DesignCLIP

🔥 DesignCLIP is based on CLIP, and we use an open source open_clip implementation and incorporate class-aware classification and contrastive learning.

🤗 PatentCLIP-ViT-B [checkpoint]

Usage

Load a DesignCLIP model:

import open_clip

model, _, preprocess = open_clip.create_model_and_transforms('hf-hub:patentclip/PatentCLIP_Vit_B', device=device)
tokenizer = open_clip.get_tokenizer('hf-hub:patentclip/PatentCLIP_Vit_B')

1. Multimodal retrieval results

Multimodal retrieval results for Image to Text and Text to image using both CLIP and PATENTCLIP moodels.

Model	Backbone	Text-Image		Image-text
		R@5	R@10	R@5	R@10
	RN50	5.47	8.51	5.24	7.72
CLIP	RN101	7.60	11.17	6.10	9.35
	ViT-B	7.49	10.60	6.90	10.34
	ViT-L	13.26	18.29	12.07	17.17
	RN50	25.17	34.50	23.49	32.70
DesignCLIP	RN101	26.71	36.51	25.37	34.84
	ViT-B	29.75	39.91	28.39	38.26
	ViT-L	41.72	52.55	39.59	50.44

2. Patent Classification

python classification.py

Classification results (Accuracy (%)) for both CLIP and PATENTCLIP in Zero-shot and Fine-tuned settings. Datasetr used here are from the year 2023.

Model	Backbone	Zero-shot	Fine-tuned
CLIP	RN101	11.91	15.47
	ViT-B	10.88	38.99
DesignCLIP	RN101	11.93	29.92
	ViT-B	14.70	41.34

3. Patent Image Retrieval

Dowanload DeepPatent dataset for image retrieval
Training DesignCLIP + ArcFace on DeepPatent:

python ir_main.py

Citations

@inproceedings{
wang2025designclip,
title={Design{CLIP}: Multimodal Learning with {CLIP} for Design Patent Understanding},
author={Zhu Wang and Homaira Huda Shomee and Sathya N. Ravi and Sourav Medya},
booktitle={The 2025 Conference on Empirical Methods in Natural Language Processing},
year={2025},
url={https://openreview.net/forum?id=pTumSzkDLC}
}

Acknowledgement

The implementation of PatentCLIP relies on resources from open_clip, LLaVA, and SWIN + ArcFace. We thank the original authors for their open-sourcing.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
open_clip		open_clip
.DS_Store		.DS_Store
README.md		README.md
classification.py		classification.py
ir_main.py		ir_main.py
pipeline.png		pipeline.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding (EMNLP 2025)

Dataset

DesignCLIP

Usage

1. Multimodal retrieval results

2. Patent Classification

3. Patent Image Retrieval

Citations

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

AI4Patents/DesignCLIP

Folders and files

Latest commit

History

Repository files navigation

DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding (EMNLP 2025)

Dataset

DesignCLIP

Usage

1. Multimodal retrieval results

2. Patent Classification

3. Patent Image Retrieval

Citations

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages