Skip to content

VelikayaScarlet/F2Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚖️F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations📖

Static Badge

Tian Lan1, Jiang Li1, Yemin Wang1, Xu Liu2, Xiangdong Su*1, Guanglai Gao1

1College of Computer Science, Inner Mongolia University, China 
2School of Informatics, Xiamen University, China 

* corresponding author

F2Bench

Paper: https://aclanthology.org/2025.emnlp-main.105.pdf

📜Abstract

🚀Core Contributions

##Evaluation Benchmark## We designed and released F²Bench, which covers 10 demographic group categories, including a range of intersectional combinations, with the goal of comprehensively evaluating the fairness performance of LLMs across diverse population groups.

##Open-ended Tasks## In F²Bench, we propose two open-ended tasks based on text generation and reasoning with factuality consideration. These tasks better reflect real-world usage than traditional closed-ended evaluation.

##Experimental Analysis## Using F²Bench, we evaluated several popular LLMs and compared their performance, analyzed the underlying reasons for such performance, discussed the difference between closed-ended evaluation and open-ended evaluation, and proposed new insights for future training strategies of LLMs.

🔬Dependencies

tqdm
zhipuai
openai
transformers
pandas
itertools
torch
modelscope
openpyxl

💯How to Run a Evaluation?

As same as McBE

About

Data and code for F2Bench

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages