Enable automatic CACHE_DIR for GPU inference only #520

helena-intel · 2024-01-17T20:30:21Z

Setting CACHE_DIR used to speed up model loading on both CPU and GPU, but recent model loading improvements in OpenVINO mean that on CPU it is now faster to load the model without setting CACHE_DIR. I tested this with several kinds of models, on Core and Xeon. Loading without cache can be about three times faster. On GPU loading from cache is still about three times faster (tested with ARC770). This PR enables automatic CACHE_DIR only when device is explicitly set to GPU.

For seq2seq models the ov_config was set on init, but because models can also be moved to GPU with .to() the setting of CACHE_DIR should now be done on compiling the model. The OVDecoder and OVEncoder had no concept of a model_save_dir. I added a parent_model to the constructor to get access to this.

HuggingFaceDocBuilderDev · 2024-01-17T20:35:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

echarlaix

LGTM, thanks @helena-intel !

echarlaix · 2024-01-22T17:45:57Z

tests/openvino/test_modeling.py

        for device in ("AUTO", "AUTO:CPU"):
-            model = OVModelForCausalLM.from_pretrained(model_id, export=True, use_cache=True, device=device)
+            OV_MODEL_ID = "echarlaix/distilbert-base-uncased-finetuned-sst-2-english-openvino"


can be moved outside loop

optimum/intel/openvino/modeling_seq2seq.py

Enable automatic CACHE_DIR for GPU inference only

bc87f2f

helena-intel requested review from AlexKoff88 and echarlaix January 17, 2024 20:30

echarlaix approved these changes Jan 22, 2024

View reviewed changes

Use parent_model._device in OVEncoder and OVDecoder

218b9a9

echarlaix merged commit 672c022 into main Jan 23, 2024
12 of 14 checks passed

echarlaix deleted the helena/disable-cpu-cachedir branch January 23, 2024 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable automatic CACHE_DIR for GPU inference only #520

Enable automatic CACHE_DIR for GPU inference only #520

helena-intel commented Jan 17, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 17, 2024

echarlaix left a comment

echarlaix Jan 22, 2024

Enable automatic CACHE_DIR for GPU inference only #520

Enable automatic CACHE_DIR for GPU inference only #520

Conversation

helena-intel commented Jan 17, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jan 17, 2024

echarlaix left a comment

Choose a reason for hiding this comment

echarlaix Jan 22, 2024

Choose a reason for hiding this comment

helena-intel commented Jan 17, 2024 •

edited

Loading