Enable automatic CACHE_DIR for GPU inference only #520
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Setting CACHE_DIR used to speed up model loading on both CPU and GPU, but recent model loading improvements in OpenVINO mean that on CPU it is now faster to load the model without setting CACHE_DIR. I tested this with several kinds of models, on Core and Xeon. Loading without cache can be about three times faster. On GPU loading from cache is still about three times faster (tested with ARC770). This PR enables automatic CACHE_DIR only when device is explicitly set to GPU.
For seq2seq models the ov_config was set on init, but because models can also be moved to GPU with
.to()
the setting of CACHE_DIR should now be done on compiling the model. The OVDecoder and OVEncoder had no concept of a model_save_dir. I added a parent_model to the constructor to get access to this.