✨ Evaluating Large Language Models with Educational Knowledge Graphs: Challenges with Prerequisite Relationships and Multi-Hop Reasoning ✨

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This repo maintains and updates benchmark on evaluating LLMs with Educational KGs, with a focus on prerequistie relationships. 😄

Installation

Download the whole reporitory. Or clone:

$> git clone https://github.com/ai-for-edu/Evaluating-Large-Language-Models-with-Educational-Knowledge-Graphs-on-Prerequisite-Relationships

How to benchmark

After clone or download the Repositorym redirect to /benchmark/ folder:

$> cd benchmark/

1. Install requirements

$> pip install -r requirements.txt

2. Generate question queries

To generate questions on all of the tasks and on all of the datasets:

$> python generate_question_query.py

Feel free to play around the code to customize the query generation.

3. Set API connection

Please fill in the 'API_KEY' in /benchmark/Edu_KG_Eval/global_config.py. Besides that, also modify the following connection details in function generate_answer of class ApiFoxAnswer:

HTTPS path in 'conn = http.client.HTTPSConnection()'
'User-Agent' in dictionary 'headers'

4. Get answers from LLMs

To get the answers on all of the queries generated in the last step:

$> python obtain_llm_answers.py

5. Evaluate LLM answers

As this step may require manual check, we provide some methods may be helpful to calculate accuracy, precision, recall, AUROC and AUPRC in the following script: 'auto_eval_test.py'.

Dataset

The KGs with KCs and prerequsites relationships dataset are in /data folder with each subfolder inside holding one GraphML for one KG. Or can also download all of them at once from /data/wrapup/ folder, which contains all GraphML files and corresponding JSON files.

The Croissant Metadata is at Link to File.

A duplicate of the GraphML dataset can also be found at HugginFace: Link to Data.

Citation

TBA

Contact

Authors:

Aoran Wang: aoran.wang@uni.lu, Chaoli Zhang: chaolizcl@zjnu.edu.cn, Jun Pang: jun.pang@uni.lu, Qingsong Wen: qingsongedu@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
benchmark		benchmark
data		data
websites		websites
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Evaluating Large Language Models with Educational Knowledge Graphs: Challenges with Prerequisite Relationships and Multi-Hop Reasoning ✨

Installation

How to benchmark

1. Install requirements

2. Generate question queries

3. Set API connection

4. Get answers from LLMs

5. Evaluate LLM answers

Dataset

Citation

Contact

About

Releases

Packages

Languages

License

ai-for-edu/Evaluating-Large-Language-Models-with-Educational-Knowledge-Graphs-on-Prerequisite-Relationships

Folders and files

Latest commit

History

Repository files navigation

✨ Evaluating Large Language Models with Educational Knowledge Graphs: Challenges with Prerequisite Relationships and Multi-Hop Reasoning ✨

Installation

How to benchmark

1. Install requirements

2. Generate question queries

3. Set API connection

4. Get answers from LLMs

5. Evaluate LLM answers

Dataset

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages