Pum Jun Kim, Seung-Ah Lee, Seongho Park, Dongyoon Han, Jaejun Yoo
[Paper] | Project Page | Quick Start: REFINED-BIAS
Understanding how neural networks utilize visual cues provides a human-interpretable perspective on their internal decision processes. Building on this motivation, the cue-conflict benchmark has initiated important progress in bridging human and model perception. However, despite its value, it falls short of meeting the necessary conditions for a precise bias analysis: (1) it relies on stylized images that blend shape and texture cues, blurring their distinction and offering no control over the relative contribution of each cue; (2) limiting evaluation to preselected classes distorts model predictions based on cues; and (3) the cue-conflict metric fails to distinguish models that genuinely utilize the cues. Collectively, these limitations hinder an accurate interpretation of model bias. To address this, we introduce REFINED-BIAS, a diagnostic benchmark provides refined and more accurate measurements. REFINED-BIAS generates artifact-free samples while preserving human defined shape and texture as faithfully as possible, and quantifies cue sensitivity across the full label space using Mean Reciprocal Rank, enabling a fairer cross-model comparisons. Extensive evaluations across diverse training regimes and architectures demonstrate that REFINED-BIAS not only provides a more accurate assessment of shape and texture biases than prior benchmark, but also reveals new insights into how models utilize cues, clarifying previously inconsistent findings.
We (1) define disentangled stimuli based on human perception rather than model-derived heuristics, ensuring that each cue carries pure and interpretable information, and (2) select classes suited for bias evaluation and generate data to maximize the predictive strength of both shape and texture cues, thereby balancing cue informativeness.
Our metric computes the reciprocal ranks of the correct shape and texture labels within the model's full prediction ranking.
We refer to these two components as
Here,
To evaluate REFINED-BIAS across different model architectures or learning strategies, run the commands below.
All required checkpoints will be downloaded automatically.
conda create --name refined python=3.8.20 -y
conda activate refined
pip install -r ./requirements.txtDataset Structure
datasets/
├── refined_bias_shape/
│ ├── balloon/
│ │ ├── balloon_0.png
│ │ ├── balloon_1.png
│ │ └── ...
│ ├── book/
│ └── ...
└── refined_bias_texture/
├── brain_coral/
│ ├── 4x4_brain_coral_0.png
│ ├── 4x4_brain_coral_1.png
│ └── ...
├── texture/
└── ...
Image size: (3, 224, 224)
# REFINED-BIAS Shape Cue
python eval_refined_bias.py --dataset refined_bias_shape --across arch
# REFINED-BIAS Texture Cue
python eval_refined_bias.py --dataset refined_bias_texture --across arch# REFINED-BIAS Shape Cue
python eval_refined_bias.py --dataset refined_bias_shape --across strategy
# REFINED-BIAS Texture Cue
python eval_refined_bias.py --dataset refined_bias_texture --across strategyREFINED-BIAS Shape Bias (across: arch)
• bagnet9 : 0.0518
• bagnet17 : 0.0988
• bagnet33 : 0.2438
• ...
Detailed per-class scores for each model and learning strategy can be found in the .json files located under:
./results/across_model_architecture./results/across_learning_strategy
