Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OV] Fix data-free VLM compression via optimum-cli #1058

Merged

Conversation

nikita-savelyevv
Copy link
Collaborator

What does this PR do?

Changes
When exporting an image-text-to-text model with optimum-cli in int4, all model components were compressed to int4. However, only language model should be compressed to int4 and other components should be compressed to int8_sym. The fix is to make VLM data-free compression run inside from_pretrained call similar to data-aware case for LMs.

Tests
Introduced additional checks for low-precision weight nodes of pipeline sub-models. This should prevent similar issues in the future.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/commands/export/openvino.py Outdated Show resolved Hide resolved
tests/openvino/test_exporters_cli.py Show resolved Hide resolved
tests/openvino/test_quantization.py Show resolved Hide resolved
tests/openvino/test_quantization.py Show resolved Hide resolved
tests/openvino/test_quantization.py Show resolved Hide resolved
@nikita-savelyevv nikita-savelyevv marked this pull request as ready for review December 10, 2024 15:04
@nikita-savelyevv nikita-savelyevv requested review from AlexKoff88 and helena-intel and removed request for AlexKoff88 December 10, 2024 15:04
@nikita-savelyevv
Copy link
Collaborator Author

This fix relates to openvinotoolkit/openvino.genai#1348

@AlexKoff88
Copy link
Collaborator

@nikita-savelyevv, thanks for the PR. Please make sure that the tests you added don't increase the overall validation time dramatically. If so, please use smaller models instead, e.g. some dummy decoder instead of opt-125m.

@nikita-savelyevv
Copy link
Collaborator Author

nikita-savelyevv commented Dec 11, 2024

@nikita-savelyevv, thanks for the PR. Please make sure that the tests you added don't increase the overall validation time dramatically. If so, please use smaller models instead, e.g. some dummy decoder instead of opt-125m.

The *export* testing time has indeed increased by 4 minutes with this PR (31min now). But overall OV testing time is still limited by *diffusion* tests which take 33 min. I suppose in the near future we should address this, but it can be done in a separate PR.

@helena-intel
Copy link
Collaborator

helena-intel commented Dec 11, 2024

@nikita-savelyevv I tested the model exported with optimum-cli built from this branch with the code from the genai README, https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#run-generation-using-vlmpipeline-api-in-python . I exported with just --weight-format int4, no other compression settings. I get an empty response. Same as when I pass --dataset with optimum-intel release. With --group-size 16 I have always gotten a good result before. I just reexported a model with group size 16 too, with optimum-intel 4d73e51 and it's still good.

I tested on Xeon, also tried with f32 INFERENCE_PRECISION_HINT, which did not make a difference.

@nikita-savelyevv
Copy link
Collaborator Author

nikita-savelyevv commented Dec 11, 2024

@nikita-savelyevv I tested the model exported with optimum-cli built from this branch with the code from the genai README, https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#run-generation-using-vlmpipeline-api-in-python . I exported with just --weight-format int4, no other compression settings. I get an empty response. Same as when I pass --dataset with optimum-intel release. With --group-size 16 I have always gotten a good result before. I just reexported a model with group size 16 too, with optimum-intel 4d73e51 and it's still good.

I tested on Xeon, also tried with f32 INFERENCE_PRECISION_HINT, which did not make a difference.

This is interesting. When running inference via optimum-intel I don't get an empty response. But when running inference via VLMPipeline from openvino.genai I also get an empty response.

My code:

import numpy as np
import openvino as ov
import openvino_genai as ov_genai
from PIL import Image
from transformers import AutoTokenizer, AutoProcessor

from optimum.intel import OVModelForVisualCausalLM

model_path = "/home/nsavel/workspace/optimum-intel/MiniCPM-V-2_6"
image_file = "dog.jpg"
prompt = "Can you describe the image?"

# optimum-intel inference
raw_image = Image.open(image_file)
model = OVModelForVisualCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
inputs = model.preprocess_inputs(text=prompt, image=raw_image, processor=processor, tokenizer=tokenizer)
generation_kwargs = dict(max_new_tokens=100, do_sample=False)
output = model.generate(**inputs, **generation_kwargs)
print("optimum-intel:", processor.decode(output[0], skip_special_tokens=True))

# openvino.genai inference
pipe = ov_genai.VLMPipeline(model_path, "CPU")
image = Image.open(image_file)
image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.uint8)
image_data = ov.Tensor(image_data)
print("\nopenvino.genai:", pipe.generate(prompt, image=image_data, max_new_tokens=100))

Output:

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
optimum-intel: user
0
Can you describe the image?
assistant
Certainly The image shows a dog sitting on what appears to be a paved surface. The dog has a white and brown coat, with long fur, particularly around the ears and tail. It's wearing a green collar with a tag attached to it. The dog's mouth is open, and it seems to be panting or possibly looking up at something with its tongue out.

openvino.genai:

When model is compressed with --group_size 16 both methods produce an adequate response:

optimum-intel: user
0
Can you describe the image?
assistant
Certainly The image features a dog sitting on what appears to be a paved surface. The dog has a white and brown coat and is wearing a green collar with a tag attached. The dog's tongue is out, and it seems to be looking upwards, possibly at something or someone. There's a leash attached to the collar, suggesting that the dog might be out for a walk.

openvino.genai:  The image shows a dog sitting on the ground. The dog is wearing a green collar and a pink tag. The dog is looking up, possibly at something or someone. The background is blurred, but it appears to be an outdoor setting with some greenery. The dog's fur is brown and white, and it has long ears. The dog's posture is relaxed, and it seems to be calm and content. The image is a close-up shot of the dog, focusing on its face and upper

This is also the case when compression with --group_size 16 is run on this branch. @helena-intel could you please also try it out on your side? If so, it shows that the issue is not with this PR, but has something to do with how group size affects inference in general.

@helena-intel
Copy link
Collaborator

Thanks for the sample code @nikita-savelyevv , I re-exported the model with group size 16 with your PR and observe the same as you did. OpenVINO GenAI inference works fine with group size 16, but not without it, both with and without your PR. Tested on Xeon with nightly/dev versions of openvino-genai and nncf.
But without this PR, at least for the MiniCPM model, users got an error about the group size and would have chosen a group size that is a divisor of the channel size (in this case 16) and then everything works. And now it silently fails. So it would be great to understand this.

@AlexKoff88
Copy link
Collaborator

@echarlaix, @IlyasMoutawwakil, PR is ready for your review.

@nikita-savelyevv
Copy link
Collaborator Author

Thanks for the sample code @nikita-savelyevv , I re-exported the model with group size 16 with your PR and observe the same as you did. OpenVINO GenAI inference works fine with group size 16, but not without it, both with and without your PR. Tested on Xeon with nightly/dev versions of openvino-genai and nncf. But without this PR, at least for the MiniCPM model, users got an error about the group size and would have chosen a group size that is a divisor of the channel size (in this case 16) and then everything works. And now it silently fails. So it would be great to understand this.

@helena-intel I've created ticket 159295 on OV GenAI to examine empty generation result.

@nikita-savelyevv
Copy link
Collaborator Author

@echarlaix @IlyasMoutawwakil could you please review this PR some time this week? I'm on a vacation starting from the next week. Thanks!

@AlexKoff88
Copy link
Collaborator

@IlyasMoutawwakil, @echarlaix, kindly take a look at the PR as @nikita-savelyevv will be out till new year.

@AlexKoff88 AlexKoff88 merged commit b17d1e0 into huggingface:main Dec 19, 2024
22 checks passed
@nikita-savelyevv nikita-savelyevv mentioned this pull request Dec 19, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants