Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model) by Copilot · Pull Request #3 · rkazants/optimum-intel

Copilot · 2026-03-08T17:51:21Z

Adds OpenVINO export and inference support for Qwen/Qwen3.5-0.8B — a visual language model with a hybrid text backbone combining GatedDeltaNet (linear attention) and standard full attention layers. Follows the patching patterns established in PR huggingface#1523 (Qwen3-Next CausalConv1D/GatedDeltaNet) and PR huggingface#1551 (Qwen3VL VLM structure).

Export pipeline (`optimum/exporters/openvino/`)

_ov_ops.py (new): convert_recurrent_attention_cell — OpenVINO conversion rule that lowers the GatedDeltaNet recurrent loop into an OV Loop op via ModuleExtension
model_configs.py: Qwen3_5TextOpenVINOConfig (registered for qwen3_5_text) with hybrid cache layout (conv + recurrent + KV), Qwen3_5OpenVINOConfig (registered for qwen3_5) splitting the VLM into vision patch embed / pos embed / merger / language components
model_patcher.py: Qwen3_5ModelPatcher patches CausalConv1D via ov_causal_conv1d and GatedDeltaNet via RecurrentAttentionCell + ModuleExtension; handles both standalone text and VLM language-model contexts. Qwen3_5VisionEmbMergerPatcher replaces cu_seqlens with attention_mask for traceable vision blocks
utils.py: qwen3_5_text → SSM_MODELS, qwen3_5 → MULTI_MODAL_TEXT_GENERATION_MODELS

Inference pipeline (`optimum/intel/openvino/`)

modeling_visual_language.py: _OVQwen3_5ForCausalLM with vision embedding interpolation, rot_pos_emb, and rope index computation using mm_token_type_ids (Qwen3.5's token-type-based modality dispatch differs from Qwen2VL's token-ID-based approach)
modeling_decoder.py: qwen3_5_text added to full-context attention mask list (same requirement as LFM2/GraniteMoeHybrid)

Key architectural differences from Qwen3VL

No deepstack — vision merger output is a single tensor, not (hidden_states, deepstack_features)
Text model uses Qwen3_5DynamicCache (hybrid conv/recurrent/KV) instead of standard DynamicCache
get_rope_index requires mm_token_type_ids parameter (constructed from image_token_id/video_token_id)
Text embeddings accessed via model.model.language_model.embed_tokens (not model.model.embed_tokens)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

huggingface.co
- Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Add conversion rule for the RecurrentAttentionCellOp operation used for GatedDeltaNet patching in OpenVINO PyTorch frontend. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>

Initial plan

f2beaad

Copilot AI assigned Copilot and rkazants Mar 8, 2026

Copilot started work on behalf of rkazants March 8, 2026 17:51 View session

Copilot AI and others added 4 commits March 8, 2026 18:03

Add _ov_ops.py with RecurrentAttentionCellOp conversion rule

2f65c34

Add conversion rule for the RecurrentAttentionCellOp operation used for GatedDeltaNet patching in OpenVINO PyTorch frontend. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add initial Qwen3.5 model support with VLM and hybrid text model

a4d032c

Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>

Fix Qwen3.5 model patcher and config for VLM text embeddings access

95a87d7

Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>

Fix comment grammar in test_decoder.py

9ced697

Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add support for Qwen 3.5 visual language model~~ Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model) Mar 8, 2026

Copilot finished work on behalf of rkazants March 8, 2026 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model)#3

Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model)#3
Copilot wants to merge 5 commits intomainfrom
copilot/add-support-qwen-3-5-0-8b

Copilot AI commented Mar 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Export pipeline (optimum/exporters/openvino/)

Inference pipeline (optimum/intel/openvino/)

Key architectural differences from Qwen3VL

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 8, 2026 •

edited

Loading

Export pipeline (`optimum/exporters/openvino/`)

Inference pipeline (`optimum/intel/openvino/`)