Fix inference for batched inputs for llama #784

echarlaix · 2024-06-26T14:47:10Z

Fix inference for batched inputs for fp32 model coming from min_dtype = torch.finfo(torch.float16).min

optimum/exporters/openvino/model_patcher.py

HuggingFaceDocBuilderDev · 2024-06-26T14:53:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/exporters/openvino/model_patcher.py

echarlaix added 3 commits June 26, 2024 15:23

fix import

5de0fc6

Fix inference for batched inputs

da8650a

fix import

1747d77

echarlaix requested a review from eaidova June 26, 2024 14:47

echarlaix changed the title ~~Fix llama~~ Fix inference for batched inputs for llama Jun 26, 2024

eaidova reviewed Jun 26, 2024

View reviewed changes

optimum/exporters/openvino/model_patcher.py Show resolved Hide resolved

echarlaix commented Jun 26, 2024

View reviewed changes

optimum/exporters/openvino/model_patcher.py Show resolved Hide resolved

check transformers version before import

280543b

echarlaix closed this Jun 26, 2024

echarlaix deleted the fix-llama branch September 3, 2024 12:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix inference for batched inputs for llama #784

Fix inference for batched inputs for llama #784

Uh oh!

echarlaix commented Jun 26, 2024

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2024

Uh oh!

Uh oh!

Uh oh!

Fix inference for batched inputs for llama #784

Fix inference for batched inputs for llama #784

Uh oh!

Conversation

echarlaix commented Jun 26, 2024

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 26, 2024

Uh oh!

Uh oh!

Uh oh!