Skip to content

[OpenVINO] Support Qwen3.5#1634

Draft
rkazants wants to merge 9 commits intohuggingface:transformers-v5from
rkazants:support_qwen3_5
Draft

[OpenVINO] Support Qwen3.5#1634
rkazants wants to merge 9 commits intohuggingface:transformers-v5from
rkazants:support_qwen3_5

Conversation

@rkazants
Copy link
Collaborator

@rkazants rkazants commented Mar 8, 2026

What does this PR do?

Fixes # (issue)

Before submitting

  • [N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copilot AI and others added 4 commits March 8, 2026 22:37
Add conversion rule for the RecurrentAttentionCellOp operation used
for GatedDeltaNet patching in OpenVINO PyTorch frontend.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
@savvadesogle
Copy link

Thank you!! 🙏♥️😊

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
@ikirsh
Copy link

ikirsh commented Mar 13, 2026

Can we ensure this PR includes a hardware compatibility check for the Core Ultra 200 series (245 through 285) and other Xe platforms?

Previous OpenVino MoE optimizations have caused kernel-level failures on these platforms without any documented warnings. We need to verify that this PR either provides full support or—at a minimum—documented and implements a graceful exit/error message rather than a system crash.

See this issue:

  • gpt-oss-20b-int4-ov runs on CPU but triggers OOM on iGPU #34416

and related issues:

  • qwen3-30b-a3b on ovms, works on CPU, crashs with out of memory on iGPU #34187
  • Qwen3-Coder-30B-A3B-Instruct-int4-ov runs on CPU but triggers OOM on iGPU #34415

@sund00bie
Copy link

Testing this branch on Linux (Python 3.12) with an Intel Arc A770. I've hit a roadblock with the qwen3_5 implementation and wanted to check if anyone has successfully completed an INT4 export yet.

What I've tried:

Pulled the current PR branch for optimum-intel and transformers (v5.3.0.dev0).

Manually bypassed the huggingface-hub >= 1.3.0 requirement by patching the metadata of huggingface_hub==0.26.5 to identify as 1.3.0.

Attempted a from_pretrained load with device_map='meta' and trust_remote_code=True.

The Error:
Could not import module 'Qwen3_5ForCausalLM'. Are this object's requirements defined correctly?

It seems the library recognizes the qwen3_5 model type but fails to find the registered class. For those who have this working: what specific versions of transformers and huggingface-hub are you using, and did you have to manually source a specific modeling_qwen3_5.py file?

Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants