support alternative parallelism

`--num-gpus` is implemented by [sharding each expert layer across GPUs](https://github.com/open-compass/MixtralKit/blob/38bbb5524ee6dcecd2d4724b06179c6783019db4/mixtralkit/layers/moe.py#L27), i.e. expert parallelism

this is probably not advisable for local experimentation, especially on batch size 1 -- where EP only adds communication overhead to no speed benefit vs naive model/pipeline parallel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support alternative parallelism #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

support alternative parallelism #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions