Skip to content

Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model)#3

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/add-support-qwen-3-5-0-8b
Draft

Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model)#3
Copilot wants to merge 5 commits intomainfrom
copilot/add-support-qwen-3-5-0-8b

Conversation

Copy link

Copilot AI commented Mar 8, 2026

Adds OpenVINO export and inference support for Qwen/Qwen3.5-0.8B — a visual language model with a hybrid text backbone combining GatedDeltaNet (linear attention) and standard full attention layers. Follows the patching patterns established in PR huggingface#1523 (Qwen3-Next CausalConv1D/GatedDeltaNet) and PR huggingface#1551 (Qwen3VL VLM structure).

Export pipeline (optimum/exporters/openvino/)

  • _ov_ops.py (new): convert_recurrent_attention_cell — OpenVINO conversion rule that lowers the GatedDeltaNet recurrent loop into an OV Loop op via ModuleExtension
  • model_configs.py: Qwen3_5TextOpenVINOConfig (registered for qwen3_5_text) with hybrid cache layout (conv + recurrent + KV), Qwen3_5OpenVINOConfig (registered for qwen3_5) splitting the VLM into vision patch embed / pos embed / merger / language components
  • model_patcher.py: Qwen3_5ModelPatcher patches CausalConv1D via ov_causal_conv1d and GatedDeltaNet via RecurrentAttentionCell + ModuleExtension; handles both standalone text and VLM language-model contexts. Qwen3_5VisionEmbMergerPatcher replaces cu_seqlens with attention_mask for traceable vision blocks
  • utils.py: qwen3_5_textSSM_MODELS, qwen3_5MULTI_MODAL_TEXT_GENERATION_MODELS

Inference pipeline (optimum/intel/openvino/)

  • modeling_visual_language.py: _OVQwen3_5ForCausalLM with vision embedding interpolation, rot_pos_emb, and rope index computation using mm_token_type_ids (Qwen3.5's token-type-based modality dispatch differs from Qwen2VL's token-ID-based approach)
  • modeling_decoder.py: qwen3_5_text added to full-context attention mask list (same requirement as LFM2/GraniteMoeHybrid)

Key architectural differences from Qwen3VL

  • No deepstack — vision merger output is a single tensor, not (hidden_states, deepstack_features)
  • Text model uses Qwen3_5DynamicCache (hybrid conv/recurrent/KV) instead of standard DynamicCache
  • get_rope_index requires mm_token_type_ids parameter (constructed from image_token_id/video_token_id)
  • Text embeddings accessed via model.model.language_model.embed_tokens (not model.model.embed_tokens)

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • huggingface.co
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits March 8, 2026 18:03
Add conversion rule for the RecurrentAttentionCellOp operation used
for GatedDeltaNet patching in OpenVINO PyTorch frontend.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for Qwen 3.5 visual language model Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model) Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants