Chat Template for Multi-Image Inference #248

tsunghan-wu · 2024-09-24T06:13:35Z

Hi,

Thanks for the great work (mPLUG-OWL3)! I was wondering if the following template is the right chatting format for multi-image inference cuz the readme didn't explicitly mention it. When using the following code, it seems that the model successfully took a lot of images as input but the performance is under my expectation. Please let me know if my template is incorrect (specifically the real_prompt and message formation).

Looking forward to hearing from you. Thanks!

huggingface_model_id = 'mPLUG/mPLUG-Owl3-7B-240728'
model = AutoModelForCausalLM.from_pretrained(
    huggingface_model_id,
    torch_dtype=torch.half,
    attn_implementation="flash_attention_2",
    trust_remote_code=True
).eval().to("cuda")
tokenizer = AutoTokenizer.from_pretrained(huggingface_model_id)
processor = model.init_processor(tokenizer)

# Given a bunch of image paths image_paths = ['file1.png', 'file2.png', ...]

images = []
for idx, image_path in enumerate(image_paths):
    images.append(Image.open(image_path).convert("RGB"))
real_prompt = '<|image|>' * len(image_paths) + prompt
messages = [{"role": "user", "content": real_prompt}, {"role": "assistant", "content": ""}]
inputs = processor(messages, images=images, video=None).to("cuda")

generated_text = model.generate(**inputs, 
    tokenizer=tokenizer, max_new_tokens=256, decode_text=True)[0]

hexmSeeU · 2024-10-30T05:28:55Z

I met the same problem. I asked the model "How many images can you see", but it only answered "1"， have you solved that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat Template for Multi-Image Inference #248

Chat Template for Multi-Image Inference #248

tsunghan-wu commented Sep 24, 2024 •

edited

Loading

hexmSeeU commented Oct 30, 2024

Chat Template for Multi-Image Inference #248

Chat Template for Multi-Image Inference #248

Comments

tsunghan-wu commented Sep 24, 2024 • edited Loading

hexmSeeU commented Oct 30, 2024

tsunghan-wu commented Sep 24, 2024 •

edited

Loading