Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat Template for Multi-Image Inference #248

Open
tsunghan-wu opened this issue Sep 24, 2024 · 1 comment
Open

Chat Template for Multi-Image Inference #248

tsunghan-wu opened this issue Sep 24, 2024 · 1 comment

Comments

@tsunghan-wu
Copy link

tsunghan-wu commented Sep 24, 2024

Hi,

Thanks for the great work (mPLUG-OWL3)! I was wondering if the following template is the right chatting format for multi-image inference cuz the readme didn't explicitly mention it. When using the following code, it seems that the model successfully took a lot of images as input but the performance is under my expectation. Please let me know if my template is incorrect (specifically the real_prompt and message formation).

Looking forward to hearing from you. Thanks!

huggingface_model_id = 'mPLUG/mPLUG-Owl3-7B-240728'
model = AutoModelForCausalLM.from_pretrained(
    huggingface_model_id,
    torch_dtype=torch.half,
    attn_implementation="flash_attention_2",
    trust_remote_code=True
).eval().to("cuda")
tokenizer = AutoTokenizer.from_pretrained(huggingface_model_id)
processor = model.init_processor(tokenizer)

# Given a bunch of image paths image_paths = ['file1.png', 'file2.png', ...]

images = []
for idx, image_path in enumerate(image_paths):
    images.append(Image.open(image_path).convert("RGB"))
real_prompt = '<|image|>' * len(image_paths) + prompt
messages = [{"role": "user", "content": real_prompt}, {"role": "assistant", "content": ""}]
inputs = processor(messages, images=images, video=None).to("cuda")

generated_text = model.generate(**inputs, 
    tokenizer=tokenizer, max_new_tokens=256, decode_text=True)[0]
@hexmSeeU
Copy link

I met the same problem. I asked the model "How many images can you see", but it only answered "1", have you solved that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants