tensor parallel for customized model #6471

zzxslp · 2024-09-02T00:41:19Z

zzxslp
Sep 2, 2024

Hi! If I want to do model inference with a customized model, how do I enable tensor parallel to shard the model across multiple GPUs? I couldn't find a clear instruction on how to set the correct inject_policy, or if there are other solutions.

For my specific case, I have a multimodal-LLM, with a ViT, projector, and an LLM, but not sure how to evaluate it in a sharded way in deepspeed.

zzxslp · 2024-09-04T21:33:55Z

zzxslp
Sep 4, 2024
Author

Just follow-up on this question, grateful if anyone has some suggestions!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tensor parallel for customized model #6471

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

tensor parallel for customized model #6471

Uh oh!

zzxslp Sep 2, 2024

Replies: 1 comment

Uh oh!

zzxslp Sep 4, 2024 Author

zzxslp
Sep 2, 2024

zzxslp
Sep 4, 2024
Author