MOE model instead of 72B

**Is your feature request related to a problem? Please describe.**
72B dense model is not ideal for most deployments.

**Describe the solution you'd like**
It would be better for the MiroThinker dataset to be trained on a MoE like qwen3-next 80BA3B

**Additional context**
Thank you for miroflow and mirothinker, great work all around, this request is truly a request.