You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see safe_open and deserialize, it seems that both two are lazy loading.
So if I don't want to load safetensor without lazy loading
how could I do, thanks
Information
The official example scripts
My own modified scripts
Reproduction
I use sglang, and in sglang model_loader/weight_utils.py
it load safetensors like this if not is_all_weights_sharded: with safe_open(st_file, framework="pt") as f: for name in f.keys(): # noqa: SIM118 param = f.get_tensor(name) yield name, param else: result = load_file(st_file, device="cpu") for name, param in result.items(): yield name, param
I found it loads safe tensor too slow(about 20min+), whether is_all_weights_sharded is True
and if I prefetch safetensors before load_model(like cat * > /dev/null), it could only cost 5min
I try to use threadExecutor to parallel this code, and although get_tensor could be quick, but loading weight still cost 20min +, so I doubt that lazy loading.thanks
Expected behavior
without lazy loading
The text was updated successfully, but these errors were encountered:
System Info
I see safe_open and deserialize, it seems that both two are lazy loading.
So if I don't want to load safetensor without lazy loading
how could I do, thanks
Information
Reproduction
I use sglang, and in sglang model_loader/weight_utils.py
it load safetensors like this
if not is_all_weights_sharded: with safe_open(st_file, framework="pt") as f: for name in f.keys(): # noqa: SIM118 param = f.get_tensor(name) yield name, param else: result = load_file(st_file, device="cpu") for name, param in result.items(): yield name, param
I found it loads safe tensor too slow(about 20min+), whether is_all_weights_sharded is True
and if I prefetch safetensors before load_model(like cat * > /dev/null), it could only cost 5min
I try to use threadExecutor to parallel this code, and although get_tensor could be quick, but loading weight still cost 20min +, so I doubt that lazy loading.thanks
Expected behavior
without lazy loading
The text was updated successfully, but these errors were encountered: