Could I get safe tensor without lazy loading? #577

voidxb · 2025-02-24T07:55:33Z

System Info

I see safe_open and deserialize, it seems that both two are lazy loading.
So if I don't want to load safetensor without lazy loading
how could I do, thanks

Information

The official example scripts
My own modified scripts

Reproduction

I use sglang, and in sglang model_loader/weight_utils.py
it load safetensors like this
if not is_all_weights_sharded: with safe_open(st_file, framework="pt") as f: for name in f.keys(): # noqa: SIM118 param = f.get_tensor(name) yield name, param else: result = load_file(st_file, device="cpu") for name, param in result.items(): yield name, param
I found it loads safe tensor too slow(about 20min+), whether is_all_weights_sharded is True
and if I prefetch safetensors before load_model(like cat * > /dev/null), it could only cost 5min
I try to use threadExecutor to parallel this code, and although get_tensor could be quick, but loading weight still cost 20min +, so I doubt that lazy loading.thanks

Expected behavior

without lazy loading

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could I get safe tensor without lazy loading? #577

Could I get safe tensor without lazy loading? #577

voidxb commented Feb 24, 2025

Could I get safe tensor without lazy loading? #577

Could I get safe tensor without lazy loading? #577

Comments

voidxb commented Feb 24, 2025

System Info

Information

Reproduction

Expected behavior