Skip to content

[OpenVINO] Support Gemma3n#1613

Draft
rkazants wants to merge 6 commits intohuggingface:mainfrom
rkazants:support_gemma3n
Draft

[OpenVINO] Support Gemma3n#1613
rkazants wants to merge 6 commits intohuggingface:mainfrom
rkazants:support_gemma3n

Conversation

@rkazants
Copy link
Collaborator

@rkazants rkazants commented Feb 12, 2026

What does this PR do?

The command for model exporting:

optimum-cli export openvino -m google/gemma-3n-E2B-it gemma-3n-E2B-it --task=image-text-to-text

The script for inference:

from transformers import AutoProcessor
from optimum.intel.openvino import OVModelForVisualCausalLM

model_id = "google/gemma-3n-E2B-it"
model = OVModelForVisualCausalLM.from_pretrained(model_id)

processor = AutoProcessor.from_pretrained(model_id, padding_side="left")

url = "https://media.istockphoto.com/id/1192867753/photo/cow-in-berchida-beach-siniscola.jpg?s=612x612&w=0&k=20&c=v0hjjniwsMNfJSuKWZuIn8pssmD5h5bSN1peBd1CmH4="
messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful assistant."}
        ]
    },
    {
        "role": "user", "content": [
            {"type": "image", "url": url},
            {"type": "text", "text": "What is shown in this image?"},
        ]
    },
]
inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
    add_generation_prompt=True,
)

output = model.generate(**inputs, max_new_tokens=50)
print(processor.decode(output[0, inputs.input_ids.shape[1]: ], skip_special_tokens=True))

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the Gemma3n vision-language model to the OpenVINO integration in optimum-intel. Gemma3n appears to be a variant of Gemma3 with special attention mechanisms including sliding attention and shared KV caches.

Changes:

  • Registers Gemma3n model type and configuration classes for OpenVINO export
  • Adds custom KV cache handling for Gemma3n's unique architecture (sliding attention, shared layers)
  • Maps Gemma3n to the same visual-language model class as Gemma3

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
optimum/intel/openvino/modeling_visual_language.py Maps "gemma3n" to _OVGemma3ForCausalLM class for inference
optimum/exporters/openvino/utils.py Adds "gemma3n" to the list of supported vision-language models
optimum/exporters/openvino/model_configs.py Defines Gemma3nTextOpenVINOConfig with custom KV cache handling and Gemma3nOpenVINOConfig for vision-language export

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants