Training Framework Configuration Support for LLaDA-MoE

an we configure custom training for LLaDA-MoE variants, like adding MoE-specific YAML params (e.g., expert routing) and VeOmni integration?
On a related note, during my experiments with LLaDA-MoE following the paper's exact settings , I'm seeing the z-loss (noise prediction component) steadily rising after interation.

![Image](https://github.com/user-attachments/assets/da795fdb-6c77-4433-aaa7-e9949566752e)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Framework Configuration Support for LLaDA-MoE #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Training Framework Configuration Support for LLaDA-MoE #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions