You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#from transformers import AutoTokenizer, BloomModel
from optimum.intel.openvino import OVModelForCausalLM
from transformers import AutoTokenizer, BloomModel
import torch
from tqdm import tqdm
from time import time
from time import sleep
model_str = 'bigscience/bloom'
tokenizer = AutoTokenizer.from_pretrained(model_str, device=0)
ov_model = OVModelForCausalLM.from_pretrained(model_str, from_transformers=True,
device_map='auto',
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True)
#model = BloomModel.from_pretrained(model_str,device_map='auto',torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)
#model.eval()
print('# [INFO] model loading complete, wait for 3 sec to start inference')
sleep(3.)
inputs = tokenizer('Hello, my dog is cute', return_tensors='pt')
n_tokens = len(inputs['input_ids'][0])
avg_latency = 0.
#with torch.inference_mode(), torch.cpu.amp.autocast():
for t in tqdm(range(100)):
t0 = time()
#outputs = model(**inputs)
outputs = ov_model(**inputs)
if t > 9:
avg_latency += time() - t0
DRAM is 512G, use huggingface interface have torch_dtype parameter, It can be load on cpu, but OVModelForCausalLM interface result OOM,- Are the patameters similar to torch_dtype in OVModelForCausalLM
The text was updated successfully, but these errors were encountered:
Currently the torch_dtype parameter is ignored but enabling the loading of the model in bf16 before exporting it to the OpenVINO format is something that we plan to integrate in the future, thanks for letting us know that it was an important feature for users !
DRAM is 512G, use huggingface interface have
torch_dtype
parameter, It can be load on cpu, butOVModelForCausalLM
interface result OOM,- Are the patameters similar totorch_dtype
inOVModelForCausalLM
The text was updated successfully, but these errors were encountered: