Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model)#3
Draft
Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model)#3
Conversation
Add conversion rule for the RecurrentAttentionCellOp operation used for GatedDeltaNet patching in OpenVINO PyTorch frontend. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Co-authored-by: rkazants <35459624+rkazants@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add support for Qwen 3.5 visual language model
Add Qwen3.5 model support (VLM + hybrid GatedDeltaNet text model)
Mar 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds OpenVINO export and inference support for
Qwen/Qwen3.5-0.8B— a visual language model with a hybrid text backbone combining GatedDeltaNet (linear attention) and standard full attention layers. Follows the patching patterns established in PR huggingface#1523 (Qwen3-Next CausalConv1D/GatedDeltaNet) and PR huggingface#1551 (Qwen3VL VLM structure).Export pipeline (
optimum/exporters/openvino/)_ov_ops.py(new):convert_recurrent_attention_cell— OpenVINO conversion rule that lowers the GatedDeltaNet recurrent loop into an OV Loop op viaModuleExtensionmodel_configs.py:Qwen3_5TextOpenVINOConfig(registered forqwen3_5_text) with hybrid cache layout (conv + recurrent + KV),Qwen3_5OpenVINOConfig(registered forqwen3_5) splitting the VLM into vision patch embed / pos embed / merger / language componentsmodel_patcher.py:Qwen3_5ModelPatcherpatches CausalConv1D viaov_causal_conv1dand GatedDeltaNet viaRecurrentAttentionCell+ModuleExtension; handles both standalone text and VLM language-model contexts.Qwen3_5VisionEmbMergerPatcherreplacescu_seqlenswithattention_maskfor traceable vision blocksutils.py:qwen3_5_text→SSM_MODELS,qwen3_5→MULTI_MODAL_TEXT_GENERATION_MODELSInference pipeline (
optimum/intel/openvino/)modeling_visual_language.py:_OVQwen3_5ForCausalLMwith vision embedding interpolation,rot_pos_emb, and rope index computation usingmm_token_type_ids(Qwen3.5's token-type-based modality dispatch differs from Qwen2VL's token-ID-based approach)modeling_decoder.py:qwen3_5_textadded to full-context attention mask list (same requirement as LFM2/GraniteMoeHybrid)Key architectural differences from Qwen3VL
Qwen3_5DynamicCache(hybrid conv/recurrent/KV) instead of standardDynamicCacheget_rope_indexrequiresmm_token_type_idsparameter (constructed fromimage_token_id/video_token_id)model.model.language_model.embed_tokens(notmodel.model.embed_tokens)Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
huggingface.co/home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js(dns block)If you need me to access, download, or install something from one of these locations, you can either:
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.