UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Overview

The UCFE Benchmark provides a user-centric framework for evaluating the performance of large language models (LLMs) in complex financial tasks. The complete benchmark dataset is available in UCFE_bench.json.

How to Run the Simulator

Follow these steps to set up and run the simulator:

Set your API key in the config folder.
Run the simulator with the following command: python run_ckpt.py

How to Evaluate the Model

You can evaluate individual models or run evaluations for all models:

Evaluate for a single model: bash scripts/eval_model.sh
Evaluate for all models: bash scripts/eval_all.sh

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets/readme		assets/readme
config		config
log		log
res		res
script		script
utils		utils
LICENSE.txt		LICENSE.txt
UCFE_bench.json		UCFE_bench.json
eval_elo.py		eval_elo.py
readme.md		readme.md
run_ckpt.py		run_ckpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Overview

How to Run the Simulator

How to Evaluate the Model

About

Releases

Packages

Languages

License

TobyYang7/UCFE-Benchmark

Folders and files

Latest commit

History

Repository files navigation

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Overview

How to Run the Simulator

How to Evaluate the Model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages