Skip to content

Tuo-Liang/YESBUT_V2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

When ‘YES’ Meets ‘BUT’: Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?

Tuo Liang1,*  Zhe Hu2,*  Jing Li2,  Hao Zhang1,  Yiren Lu1,  Yunlai Zhou1,  Yiran Qiao1, Disheng Liu1, Jeirui Peng1, Jing Ma1, Yu Yin1,
1Case Western Reserve University, 2The Hong Kong Polytechnic University 
* equal contribution

[Arxiv] [Website] [🤗 Dataset]


🌟 YesBut V2 Dataset for Comparative Reasoning on Contradictory Comics

🚩 Previous Work: NeurIPS 2024 Oral Paper: YESBUT_v1

Overview

Understanding humor—particularly when it involves complex, contradictory narratives that require comparative reasoning—remains a significant challenge for large vision-language models (VLMs). We introduce YESBUT (V2), a comprehensive benchmark designed to evaluate how well VLMs can understand and interpret complex, contradictory humor in comics.

Building upon our previous work YESBUT_v1, this expanded dataset includes 1,262 comic images featuring juxtaposed panels that create humor through contradictions. We design multi-tiered tasks—ranging from basic content recognition to deep narrative comprehension—ensuring a comprehensive assessment of AI’s interpretative abilities.

NAME

What we update comparing to [YESBUT_v1]?

  1. Expanded Dataset: YESBUT grows from 349 to 1,262 images, enhancing diversity and robustness for better VLM evaluation;
  2. Comprehensive Evaluation: We assess various VLMs and LLMs, comparing general-purpose, reasoning-enhanced, and multi-image models;
  3. Fine-grained Analysis: Statistical and ablation studies reveal key factors affecting humor comprehension and model failuresl
  4. Practical Improvements: We propose simple yet effective strategies to enhance VLMs' understanding of juxtaposition-based humor.

Dataset

Download

  • Annotation File: The annotated data is available at: data/YESBUT_v2.json.

  • Image Download: Download the associated images by running the following command:

python download_images.py --json_file='data/yesbut_v2.json' --save_folder='data/YesBut_images'

This will save the images to the specified data/YesBut_images folder.

Annotated data format

  • The file is in /data/yesbut_v2.json
  • The file has the format such as following.
 {
    {
        "image_file": "00001.jpg",
        "description": "The comic is divided into two panels, each presenting a contradictory perspective of the same object—a mug. In the first panel, the mug is illustrated as an adorable fox with closed eyes, giving off a serene and cute vibe. It's an object that one would admire or find endearing. However, in the second panel, we see a person drinking from this fox-shaped mug. The contradiction lies in the mug's impracticality: its ears and head protrude awkwardly, obstructing the person's ability to sip comfortably. Despite its endearing appearance, the mug fails its primary function as a practical vessel for beverages.",
        "caption": "The comic is divided into two panels, each presenting a contradictory perspective of the same object—a mug. In the first panel, the mug is illustrated as an adorable fox with closed eyes, giving off a serene and cute vibe. It's an object that one would admire or find endearing. However, the second panel reveals a practical issue: a person attempts to drink from the fox-shaped mug, but its design—featuring protruding ears and head—awkwardly interferes, complicating the act of sipping comfortably.",
        "contradiction": "The comic illustrates a contradiction where a mug designed as an adorable fox is charming to look at but proves impractical to use due to its awkwardly protruding ears and head that hinder drinking.",
        "moral_mcq": "A. The comic implies that adding decorative elements enhances the aesthetic appeal, yet overlooks how they can detract from practicality and user experience.\n\nB. This illustration critiques the conflict between the aesthetics and utility of an object, emphasizing that a good object design needs to balance both to ensure a harmonious and practical experience in any aspect of life.\n\nC. The illustration implies that an object’s initial appeal guarantees satisfaction, despite possible functional drawbacks or discomfort encountered during its use.\n\nD. The image suggests enduring inconvenience is justified for owning something visually unique, emphasizing aesthetics over practicality and ease of use.",
        "moral_mcq_answer": "B",
        "title_mcq": "A. A Toast to Vulpine Grace\nB. Charming Design, Prickly Reality\nC. Enchanting Elixir: The Fox's Secret Brew\nD. Harmony in a Sip",
        "title_mcq_answer": "B",
        "social_info": "1. Aesthetic appeal can sometimes outweigh practicality in consumer choices.\n  2. Functional design is important for everyday usability.\n  3. The contrast between appearance and functionality can lead to humorous or frustrating situations.\n  4. Novelty items are often bought for their visual appeal rather than their practicality.+C2:C540",
        "Linguistic_context": "None",
        "Panel_Bounding_Boxes": "[[[32, 667], [1321, 2287]], [[1371, 674], [2663, 2287]]]",
        "Context_Bounding_Boxes": "[]",
        "contain_text": "no",
        "category": "This comic belongs to the category of daily life jokes where the humor lies in the functionality versus aesthetic dilemma of everyday objects. ### Daily Life joke",
        "link": "https://drive.google.com/file/d/1I1TSrHLoZNtK9Q-T6zIQBpT8kJHpI-1f/view?usp=drivesdk"
    }

Experimental Design

Experimental Setting

  • Sample components: (image, caption, contradiction, symbolism, title)

Task 1: Description Generation

  • Image Setting: p(description|image)

Task 2: Contradiction Generation

  • Image Setting: p(contradiction|image)
  • Full Setting: p(contradiction|image, caption)
    • oracle caption: written by annotators (upper bound)
    • system caption: generated by VLM itself

Task 3: Title MCQ

  • Image Setting: p(title_option|image)
  • Full Setting: p(title_option|image, caption)
    • oracle caption: written by annotators (upper bound)
    • system caption: generated by VLM itself

Task 4: Deep Philosophy MCQ

  • Image Setting: p(Symbolism_option|image)
  • Full Setting: p(Symbolism_option|image, caption)
    • oracle caption: written by annotators (upper bound)
    • system caption: generated by VLM itself

Evaluation

Modify the "predict_model_name.sh".

#Task claude3 as an example
data="annotated_data/data_annotation.json"
image_folder="YESBUT_cropped_yesbut"
write_path_surffix=".json"
#task options: contradiction | moral_mcq | title_mcq

use_caption=False

task="contradiction"
echo "==============================="
echo "claude3 eval"
echo "==============================="
python3 -u predict_claude_opus.py \
    --read_path ${data} \
    --write_path "results/results_claude3_"${task}"_"${write_path_surffix} \
    --task ${task} \
    --use_caption ${use_caption} \
    --image_folder ${image_folder}

Then run the command:

bash predict_model_name.sh

Lora Finetuning

We offer LlaVA-Next-7B, LlaVA-Next-13B and Qwen2-VL-7B 3 models lora finetuning samples in folder finetune.

For example, run the command:

bash llava13b.sh

All the parameters can be modified in .sh file.

Citation

@article{liang2025yes,
  title={When'YES'Meets' BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?},
  author={Liang, Tuo and Hu, Zhe and Li, Jing and Zhang, Hao and Lu, Yiren and Zhou, Yunlai and Qiao, Yiran and Liu, Disheng and Peng, Jeirui and Ma, Jing and others},
  journal={arXiv preprint arXiv:2503.23137},
  year={2025}
}

@inproceedings{
hu2024cracking,
title={Cracking the Code of Juxtaposition: Can {AI} Models Understand the Humorous Contradictions},
author={Zhe Hu and Tuo Liang and Jing Li and Yiren Lu and Yunlai Zhou and Yiran Qiao and Jing Ma and Yu Yin},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=bCMpdaQCNW}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •