MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation
Official Repository for "MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation". [π Paper(arXiv)]
Jungyeon Lee, Kangmin Lee and Taeuk Kim. Accepted to EMNLP 2025 Findings.
MAGIC is a large-scale benchmark designed to evaluate knowledge conflict detection and localization in Retrieval-Augmented Generation (RAG) systems. It focuses on multi-hop reasoning and graph-structured contexts, where inter-context knowledge conflicts emerge between retrieved passages.
.\
βββ dataset
β βββ multi-hop/
β β βββ 1-multi-hop_conflict.json/
β β βββ 2-multi-hop_conflict.json/
β β βββ 3-multi-hop_conflict.json/
β β βββ 4-multi-hop_conflict.json/
β βββ single-hop/
β β βββ 1-single-hop_conflict.json/
β β βββ 2-single-hop_conflict.json/
β β βββ 3-single-hop_conflict.json/
β β βββ 4-single-hop_conflict.json/
ID: Unique identifier for each sample.rel_id: Relation ID corresponding to the target knowledge relation (e.g.P150from Wikidata)subgraph: A set of surrounding triplets retrieved via DFS traversal from the source knowledge graph around theoriginal_triplet.original_triplet: Randomly sampled target triplet from the source graph β serves as the anchor for conflict formation.perturb_triplet: Modified triplet(s) intentionally constructed to introduce a knowledge conflict with theoriginal_triplet.context1,context2: Textual representations of theoriginal_tripletandperturb_triplet, respectively.