Skip to content

huangxinping/HWTCMBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HWTCMBench

A Comprehensive Benchmark for Evaluating Large Language Models in Traditional Chinese Medicine.

Changelog

  • 2024-08-28: Added 7226 questions.
  • 2024-07-20: Debut.

Dataset

The dataset is available at https://huggingface.co/datasets/Monor/hwtcm

Benchmarking model accuracy

multiple-choice questions(单选题) multiple-answers questions(多选题) True/False questions(判断题)
llama3:8b 21.94% 17.71% 46.56%
phi3:14b-instruct 26.93% 1.04% 38.93%
aya:8b 17.85% 1.04% 34.35%
mistral:7b-instruct 21.76% 2.08% 48.09%
qwen1.5-7b-chat 51.35% 13.54% 46.56%
qwen1.5-14b-chat 69.94% 78.12% 31.30%
huangdi-13b-chat 21.73% 45.83% 0.00%
canggong-14b-chat(SFT)
Ours
55.98% 4.17% 23.66%
canggong-14b-chat(DPO)
Ours
72.33% 2.08% 45.80%

canggong-14b-chat is an LLM of traditional Chinese medicine still in training.

Releases

No releases published

Packages

No packages published

Languages