✨ Evaluating Large Language Models with Educational Knowledge Graphs: Challenges with Prerequisite Relationships and Multi-Hop Reasoning ✨
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This repo maintains and updates benchmark on evaluating LLMs with Educational KGs, with a focus on prerequistie relationships. 😄
Download the whole reporitory. Or clone:
$> git clone https://github.com/ai-for-edu/Evaluating-Large-Language-Models-with-Educational-Knowledge-Graphs-on-Prerequisite-Relationships
After clone or download the Repositorym redirect to /benchmark/ folder:
$> cd benchmark/
$> pip install -r requirements.txt
To generate questions on all of the tasks and on all of the datasets:
$> python generate_question_query.py
Feel free to play around the code to customize the query generation.
Please fill in the 'API_KEY' in /benchmark/Edu_KG_Eval/global_config.py. Besides that, also modify the following connection details in function generate_answer of class ApiFoxAnswer:
- HTTPS path in 'conn = http.client.HTTPSConnection()'
- 'User-Agent' in dictionary 'headers'
To get the answers on all of the queries generated in the last step:
$> python obtain_llm_answers.py
As this step may require manual check, we provide some methods may be helpful to calculate accuracy, precision, recall, AUROC and AUPRC in the following script: 'auto_eval_test.py'.
The KGs with KCs and prerequsites relationships dataset are in /data folder with each subfolder inside holding one GraphML for one KG. Or can also download all of them at once from /data/wrapup/ folder, which contains all GraphML files and corresponding JSON files.
The Croissant Metadata is at Link to File.
A duplicate of the GraphML dataset can also be found at HugginFace: Link to Data.
TBA
Authors:
Aoran Wang: aoran.wang@uni.lu, Chaoli Zhang: chaolizcl@zjnu.edu.cn, Jun Pang: jun.pang@uni.lu, Qingsong Wen: qingsongedu@gmail.com