CharacterBench: Benchmarking Character Customization of Large Language Models

Data Preparation

Using the provided test set, instruct the evaluated large language model to play specific characters for generating responses.
These generated responses will then be evaluated by CharacterJudge in subsequent evaluations.
Ensure that you update the model (YOUR_MODEL_NAME) and the path (data_path and output_path) as necessary.

python process.py --data_path eval_data/raw_data --output_path eval_data/response_data --model_name YOUR_MODEL_NAME

Convert the generated data into the input format of CharacterJudge.

cd construct_prompts
python process_wo_context_zh_all.py --data_path ../eval_data/response_data --output_path ../eval_data/evaluation_data_zh --model_name YOUR_MODEL_NAME
python process_wo_context_en_all.py --data_path ../eval_data/response_data --output_path ../eval_data/evaluation_data_en --model_name YOUR_MODEL_NAME

Evaluation

Run CharacterJudge to generate evaluation results.

bash run_zh.sh YOUR_MODEL_NAME
bash run_en.sh YOUR_MODEL_NAME

Citation

If you find our work useful for your research, please kindly cite our paper as follows:

@article{characterbench,
  title={CharacterBench: Benchmarking Character Customization of Large Language Models},
  author={Jinfeng Zhou, Yongkang Huang, Bosi Wen, Guanqun Bi, Yuxuan Chen, Pei Ke, Zhuang Chen, Xiyao Xiao, Libiao Peng, Kuntian Tang, Rongsheng Zhang, Le Zhang, Tangjie Lv, Zhipeng Hu, Hongning Wang, Minlie Huang},
  journal={AAAI},
  year={2025}
}

Contact Us

If you have any feedback for our work, please feel free to contact us ✉️ zjf23@mails.tsinghua.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
construct_prompts		construct_prompts
eval_data/raw_data		eval_data/raw_data
results		results
README.md		README.md
all_eval.py		all_eval.py
gpt_call.py		gpt_call.py
model_generation_qwen_chat.py		model_generation_qwen_chat.py
process.py		process.py
roleplay_prompt.py		roleplay_prompt.py
run_en.sh		run_en.sh
run_zh.sh		run_zh.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CharacterBench: Benchmarking Character Customization of Large Language Models

Data Preparation

Evaluation

Citation

Contact Us

About

Uh oh!

Releases

Packages

Languages

thu-coai/CharacterBench

Folders and files

Latest commit

History

Repository files navigation

CharacterBench: Benchmarking Character Customization of Large Language Models

Data Preparation

Evaluation

Citation

Contact Us

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages