Skip to content

Introduce OVPipelineQuantizationConfig#1310

Merged
echarlaix merged 19 commits intohuggingface:mainfrom
nikita-savelyevv:ns/pipeline-quantization-config
May 28, 2025
Merged

Introduce OVPipelineQuantizationConfig#1310
echarlaix merged 19 commits intohuggingface:mainfrom
nikita-savelyevv:ns/pipeline-quantization-config

Conversation

@nikita-savelyevv
Copy link
Copy Markdown
Contributor

@nikita-savelyevv nikita-savelyevv commented May 15, 2025

What does this PR do?

Changes:

  • Introduced OVPipelineQuantizationConfig allowing to specify quantization parameters per model component. Add corresponding tests.
  • Introduced a more advanced logic for inferring quantization config type from dictionary. Moved this logic to a separate function: _quantization_config_from_dict().
  • Updated default int4 config for phi4-multimodal. WWB similarity: 85.30%.

For example the code below applies int8 PTQ to lm_model, int8 WC to text_embeddings_model and no optimization to vision_embeddings_model.

from optimum.intel import OVModelForVisualCausalLM
from optimum.intel import OVPipelineQuantizationConfig, OVQuantizationConfig, OVWeightQuantizationConfig

model_id = "OpenGVLab/InternVL2-1B"
model = OVModelForVisualCausalLM.from_pretrained(
    model_id,
    export=True,
    trust_remote_code=True,
    quantization_config=OVPipelineQuantizationConfig(
        quantization_configs={
            "lm_model": OVQuantizationConfig(bits=8),
            "text_embeddings_model": OVWeightQuantizationConfig(bits=8),
        },
        dataset="contextual",
        trust_remote_code=True,
    )
)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review May 19, 2025 18:09
@nikita-savelyevv
Copy link
Copy Markdown
Contributor Author

@l-bat Could you please review this PR?

@nikita-savelyevv nikita-savelyevv requested a review from eaidova May 20, 2025 13:55
model,
calibration_datasets["model"],
subset_size=quantization_config.num_samples,
subset_size=quantization_config.num_samples or 128,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this value hidden here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128 was previously a default value for num_samples argument in OVQuantizationConfig. In this PR I've removed it so it is now None by default. That's why 128 has appeared here.

Ideally, we should transition to providing arguments via quantization_config.to_nncf_dict() here, as it is done for OV case. But I propose to do this in a separate PR.

Copy link
Copy Markdown
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@nikita-savelyevv
Copy link
Copy Markdown
Contributor Author

Hi @echarlaix! Do you mind us merging this or you would like to take a look?

@echarlaix
Copy link
Copy Markdown
Collaborator

apologies for the delay @nikita-savelyevv, taking a look right now!

Copy link
Copy Markdown
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great thanks @nikita-savelyevv !!

@echarlaix echarlaix merged commit 54b40e1 into huggingface:main May 28, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants