The UCFE Benchmark provides a user-centric framework for evaluating the performance of large language models (LLMs) in complex financial tasks. The complete benchmark dataset is available in UCFE_bench.json
.
Follow these steps to set up and run the simulator:
- Set your API key in the
config
folder. - Run the simulator with the following command:
python run_ckpt.py
You can evaluate individual models or run evaluations for all models:
- Evaluate for a single model:
bash scripts/eval_model.sh
- Evaluate for all models:
bash scripts/eval_all.sh