GitHub - VelikayaScarlet/F2Bench: Data and code for F2Bench

⚖️F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations📖

Tian Lan¹, Jiang Li¹, Yemin Wang¹, Xu Liu², Xiangdong Su*¹, Guanglai Gao¹

¹College of Computer Science, Inner Mongolia University, China
²School of Informatics, Xiamen University, China

* corresponding author

Paper: https://aclanthology.org/2025.emnlp-main.105.pdf

📜Abstract

🚀Core Contributions

##Evaluation Benchmark## We designed and released F²Bench, which covers 10 demographic group categories, including a range of intersectional combinations, with the goal of comprehensively evaluating the fairness performance of LLMs across diverse population groups.

##Open-ended Tasks## In F²Bench, we propose two open-ended tasks based on text generation and reasoning with factuality consideration. These tasks better reflect real-world usage than traditional closed-ended evaluation.

##Experimental Analysis## Using F²Bench, we evaluated several popular LLMs and compared their performance, analyzed the underlying reasons for such performance, discussed the difference between closed-ended evaluation and open-ended evaluation, and proposed new insights for future training strategies of LLMs.

🔬Dependencies

tqdm
zhipuai
openai
transformers
pandas
itertools
torch
modelscope
openpyxl

💯How to Run a Evaluation?

As same as McBE

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
content		content
data		data
LICENSE		LICENSE
README.md		README.md
eval-api.py		eval-api.py
eval-duo.py		eval-duo.py
load_model.py		load_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚖️F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations📖

📜Abstract

🚀Core Contributions

🔬Dependencies

💯How to Run a Evaluation?

About

Uh oh!

Releases

Packages

Languages

License

VelikayaScarlet/F2Bench

Folders and files

Latest commit

History

Repository files navigation

⚖️F²Bench: An Open-ended Fairness Evaluation Benchmark for LLMs with Factuality Considerations📖

📜Abstract

🚀Core Contributions

🔬Dependencies

💯How to Run a Evaluation?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages