Replies: 1 comment
-
Just follow-up on this question, grateful if anyone has some suggestions! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! If I want to do model inference with a customized model, how do I enable tensor parallel to shard the model across multiple GPUs? I couldn't find a clear instruction on how to set the correct inject_policy, or if there are other solutions.
For my specific case, I have a multimodal-LLM, with a ViT, projector, and an LLM, but not sure how to evaluate it in a sharded way in deepspeed.
Beta Was this translation helpful? Give feedback.
All reactions