[OpenVINO] LFM2-MoE support for transformers v4.#1609
[OpenVINO] LFM2-MoE support for transformers v4.#1609popovaan wants to merge 16 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| hidden_states_expanded = hidden_states_expanded.view(num_experts, -1, hidden_dim) # (num_experts, num_tokens, hidden_dim) | ||
|
|
||
| # Stack expert parameters | ||
| w1_stacked = torch.stack([e.w1.weight.T for e in self.experts]) |
There was a problem hiding this comment.
you also transpose weight, right? May be not needed
There was a problem hiding this comment.
No, without it shapes don't match.
I see that in Afmoe these weights are transposed as well, but concatenated beforehand. I can implement it the same way https://github.com/huggingface/optimum-intel/blob/2c48d6430c265ac259c1b264f3e2c4025cdd7b76/optimum/exporters/openvino/model_patcher.py#L7604C16-L7611C18
There was a problem hiding this comment.
@popovaan, btw, can we re-use existing MoE patching for this model?
There was a problem hiding this comment.
The closest patching to this version of MoE is Qwen3 MoE block, but it has a slightly diffrent preprocessing of routing_weights (sofmax is applied instead of dividing by sum), so I suppose we can't reuse it as is.
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
What does this PR do?
Added LFM2-MoE support for transformers v4.
Before submitting