Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could I get safe tensor without lazy loading? #577

Open
2 tasks
voidxb opened this issue Feb 24, 2025 · 0 comments
Open
2 tasks

Could I get safe tensor without lazy loading? #577

voidxb opened this issue Feb 24, 2025 · 0 comments

Comments

@voidxb
Copy link

voidxb commented Feb 24, 2025

System Info

I see safe_open and deserialize, it seems that both two are lazy loading.
So if I don't want to load safetensor without lazy loading
how could I do, thanks

Information

  • The official example scripts
  • My own modified scripts

Reproduction

I use sglang, and in sglang model_loader/weight_utils.py
it load safetensors like this
if not is_all_weights_sharded: with safe_open(st_file, framework="pt") as f: for name in f.keys(): # noqa: SIM118 param = f.get_tensor(name) yield name, param else: result = load_file(st_file, device="cpu") for name, param in result.items(): yield name, param
I found it loads safe tensor too slow(about 20min+), whether is_all_weights_sharded is True
and if I prefetch safetensors before load_model(like cat * > /dev/null), it could only cost 5min
I try to use threadExecutor to parallel this code, and although get_tensor could be quick, but loading weight still cost 20min +, so I doubt that lazy loading.thanks

Expected behavior

without lazy loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant