Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, achieving significant advancements in text comprehension, question answering, and content generation. However, their performance in knowledge-intensive tasks, particularly those requiring domain expertise, remains suboptimal due to the folllowing limitations: their pretrained knowledge, though broad, lacks depth in specialized areas due to reliance on general-domain data, resulting in inconsistencies with current domain standards; they struggle with precise, multi-step reasoning required by domain-specific rules, often failing to maintain logical consistency and accuracy in complex reasoning chains; and they exhibit limited context sensitivity, frequently misinterpreting or oversimplifying domain-specific terms and concepts that vary meaning based on context.
Retrieval-Augmented generation (RAG) offers a promising solution to customize LLMs for specific domains. Rather than retraining LLMs to incorporate updates, RAG enhances these models by leveraging external knowledge from text corpora without modifying their architecture or parameters. This approach enables LLMs to generate responses by leveraging not only their pre-trained knowledge but also real-time retrieved domain-specific information, thereby providing more accurate and reliable answers. However, the practical effectiveness of RAG systems in real-world applications is hindered by limitations in complex query comprehension, difficulties in synthesizing distributed domain knowledge, inherent constraints of LLMs, and issues with system efficiency and scalability[1].
To address these limitations, graph retrieval-augmented generation (GraphRAG) has recently emerged as a new paradigm to customize LLMs with well-organized background knowledge and improved contextual reasoning. GraphRAG, formally defined as a specialized subclass of the RAG framework, utilizes graph structures to systematically organize and retrieve domain-specific knowledge. Its workflow encompasses two key stages: offline indexing and online retrieval. The overall pipeline is as follows.
We collect the recent influential papers and benchmarks about GraphRAG. The following contents are listed in chronological order of publication.
Date | Venue | Title | Code | Notes |
---|---|---|---|---|
2025-03-18 | Arxiv | KG-IRAG: A Knowledge Graph-Based Iterative Retrieval-Augmented Generation Framework for Temporal Reasoning | No | Graphs for Knowledge Indexing & Graphs as Knowledge Carrier(KG construction from corpus) |
2025-03-14 | ICML 2025 Spotlight | Taming Knowledge Conflicts in Language Models | Yes | Mechanisms Controlling LLMs' Behavioral Preferences |
2025-03-13 | Arxiv | Retrieval-Augmented Generation with Hierarchical Knowledge | Yes | Graphs for Knowledge Indexing & Graphs as Knowledge Carrier(KG construction from corpus) |
2025-03-06 | Arxiv | In-depth Analysis of Graph-based RAG in a Unified Framework | Yes | modularizing and decoupling some graph-based RAG methods to unveil the mystery behind them and share fun and valuable insights! |
2025-02-20 | Arxiv | (HippoRAG 2)From RAG to Memory: Non-Parametric Continual Learning for Large Language Models | Yes | Graphs for Knowledge Indexing & Graphs as Knowledge Carrier(KG construction from corpus) |
2025-02-14 | Arxiv | ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation | No | Graphs for Knowledge Indexing & Graphs as Knowledge Carrier(KG construction from corpus) |
2025-02-08 | NAACL 2025 | (KG2RAG)Knowledge Graph-Guided Retrieval Augmented Generation | Yes | Graphs for Knowledge Indexing & Graphs as Knowledge Carrier(KG construction from corpus) |
2025-02-06 | The ACM Web Conference 2025 | MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot | Yes | Graphs as Knowledge Carrier(KG construction from corpus) |
2024-12-17 | Arxiv | SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation | Yes | Graphs as Knowledge Carrier(KG construction from corpus & with existing KGs) |
2024-10-28 | ICLR 2025 | Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation | Yes | Graphs as Knowledge Carrier(KG construction from corpus) |
2024-10-08 | Arxiv | LightRAG: Simple and Fast Retrieval-Augmented Generation | Yes | Graphs as Knowledge Carrier(KG construction from corpus) |
2024-05-23 | NeurIPS 2024 | HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | Yes | Graphs as Knowledge Carrier(KG construction from corpus) |
2024-04-24 | Arxiv | From Local to Global: A Graph RAG Approach to Query-Focused Summarization | Yes | Graphs as Knowledge Carrier(Knowledge Graph Construction from Corpus) |
2024-02-28 | ACL 2024 Findings | Cutting Off the Head Ends the Conflict: A Mechanism for Interpreting and Mitigating Knowledge Conflicts in Language Models | No | Mechanisms Controlling LLMs' Behavioral Preferences |
2024-01-22 | ACL 2024 Long Papers | Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts? | Yes | LLMs' Behavioral Preferences Under Knowledge Conflicts |
2023-10-24 | EMNLP 2023 | Characterizing Mechanisms for Factual Recall in Language Models | No | Mechanisms Controlling LLMs' Behavioral Preferences |
2023-09-29 | ACL 2024 Long Papers | Intuitive or Dependent? Investigating LLMs' Behavior Style to Conflicting Prompts | No | LLMs' Behavioral Preferences Under Knowledge Conflicts |
2023-05-22 | ICLR 2024 Spotlight | Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts | Yes | LLMs' Behavioral Preferences Under Knowledge Conflicts |
2022-10-25 | EMNLP 2022 | Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence | No | LLMs' Behavioral Preferences Under Knowledge Conflicts |
2021-09-10 | EMNLP 2021 | Entity-Based Knowledge Conflicts in Question Answering | Yes | LLMs' Behavioral Preferences Under Knowledge Conflicts |
Date | Method | Graph Type | Index Component | Retrieval Primitive | Retrieval Granularity | Code |
---|---|---|---|---|---|---|
2024-10-31 | fast-graphrag | Textual Knowledge Graph | Entity | Entities in question | Entity, Relationship, Chunk | Yes |
2024-10-08 | LightRAG_Local | Rich Knowledge Graph | Entity, Relationship | Low-level keywords in question | Entity, Relationship, Chunk | Yes |
2024-10-08 | LightRAG_Global | Rich Knowledge Graph | Entity, Relationship | High-level keywords in question | Entity, Relationship, Chunk | Yes |
2024-10-08 | LightRAG_Hybrid | Rich Knowledge Graph | Entity, Relationship | Both high- and low-level keywords | Entity, Relationship, Chunk | Yes |
2024-05-23 | HippoRAG | Knowledge Graph | Entity | Entities in question | Chunk | Yes |
2024-04-24 | GraphRAG_Local | Textual Knowledge Graph | Entity, Community | Question vector | Entity, Relationship, Chunk, Community | Yes |
2024-04-24 | GraphRAG_Global | Textual Knowledge Graph | Community | Question vector | Community | Yes |
2024-01-31 | RAPTOR | Tree | Tree node | Question vector | Tree node | Yes |
Note: This repo summarizes the graph constructed from the corpus into the following categories according to [2]:
- Passage Graph: each chunk represents a node. If two chunks contain a number of the same entities larger than a threshold, link an edge for these two nodes.
- Tree: a tree is built progressively by clustering leaf nodes at each layer and using an LLM to generate higher-level summary nodes for clusters with multiple children.
- Knowledge Graph: extracting entities and relationships from each chunk
- Textual Knowledge Graph: a specialized KG , with the key difference being that in a TKG, each entity and relationship is assigned a brief textual description.
- Rich Knowledge Graph: an extended version of TKG, containing more information, including textual descriptions for entities and relationships, as well as keywords for relationships.
Date | Venue | Title | Repo |
---|---|---|---|
2025-01-21 | Arxiv | A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models | Yes |
2024-12-31 | Arxiv | Retrieval-Augmented Generation with Graphs (GraphRAG)) | Yes |
2024-08-15 | Arxiv | Graph Retrieval-Augmented Generation: A Survey | Yes |
2024-05-10 | KDD 2024 | A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models | No |
2024-03-13 | Arxiv | Knowledge Conflicts for LLMs: A Survey | Yes |
Dataset | Task | Info | Metrix |
---|---|---|---|
NaturalQuestions | Simple QA | A general-domain simple QA dataset based on the Wikipedia dump knowledge base, testing the model’s ability to extract answers from encyclopedia content. | EM, F1 Score |
PopQA | Simple QA | PopQA is a large-scale open-domain question answering (QA) dataset, consisting of 14k entity-centric QA pairs. Each question is created by converting a knowledge tuple retrieved from Wikidata using a template. Each question come with the original subject_entitiey, object_entityand relationship_type annotation, as well as Wikipedia monthly page views. | Accuracy, F1 Score |
SimpleQuestion | Simple QA | SimpleQuestions is a large-scale factoid question answering dataset. It consists of 108,442 natural language questions, each paired with a corresponding fact from Freebase knowledge base. Each fact is a triple (subject, relation, object) and the answer to the question is always the object. | EM, F1 Score |
WebQ | Simple QA | This dataset consists of 6,642 question/answer pairs. The questions are supposed to be answerable by Freebase, a large knowledge graph. The questions are mostly centered around a single named entity. | Precision, Recall |
WebQSP | Simple QA | A general simple QA dataset finishing semantic parsing and QA tasks based on Freebase. | EM |
MuSiQue | Multi-hop QA | A general multi-hop QA dataset with implicit in-data knowledge, testing multi-step reasoning ability. | answer_f1, support_f1 |
2WikiMultihopQA | Multi-hop QA | A general multi-hop QA dataset with in-data implicit knowledge, emphasizing cross-Wikipedia paragraph reasoning. | EM, F1 Score |
HotpotQA | Multi-hop QA | A question answering dataset featuring natural, multi-hop questions, with strong supervision for supporting facts to enable more explainable question answering systems based on Wikipedia dump. | EM, F1 Score, Accuracy, Recall |
CWQ | Multi-hop QA | A dataset for answering complex questions that require reasoning over multiple web snippets via Freebase. | Accuracy |
MultiHop-RAG | Multi-hop QA | A Dataset for Evaluating Retrieval-Augmented Generation Across Documents. | Hits@10, Hits@4, MAP@10, MRR@10, Accuracy, Recall |
MetaQA | Multi-hop QA | A movie-domain multi-hop QA dataset relying on the in-data movie knowledge base for reasoning. | Accuracy |
Mintaka | Complex QA | Mintaka is a complex, natural, and multilingual question answering (QA) dataset composed of 20,000 question-answer pairs elicited from MTurk workers and annotated with Wikidata question and answer entities. | EM, F1 Score, Hits@1 |
GrailQA | Complex QA | A a new large-scale, high-quality dataset for question answering on knowledge bases (KBQA) on Freebase with 64,331 questions annotated with both answers and corresponding logical forms in different synta | Accuracy |
UltraDomain | Complex QA | An 18-domain complex QA dataset testing cross-domain complex question-handling ability. | LLM as judger, like GraphRAG & LightRAG |
TriviaQA | Complex QA | TriviaqQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions. | Accuracy |
LC-QuAD v2 | Complex QA | LC-QuAD 2.0 is a Large Question Answering dataset with 30,000 pairs of question and its corresponding SPARQL query. The target knowledge base is Wikidata and DBpedia, specifically the 2018 version. | Accuracy |
KQAPro | Large-scale Complex QA | A large-scale dataset of complex question answering over Wikidata. The questions are very diverse and challenging, requiring multiple reasoning capabilities including compositional reasoning, multi-hop reasoning, quantitative comparison, set operations, and etc. | Accuracy |
FACTKG | Fact Verification | Fact Verification via Reasoning on Knowledge Graphs. It consists of 108k natural language claims with five types of reasoning: One-hop, Conjunction, Existence, Multi-hop, and Negation based on DBpedia. | Accuracy |
DDXPlus | Medical QA & Diagnostic support | A new large-scale dataset for Automatic Symptom Detection (ASD) and Automatic Diagnosis (AD) systems in the medical domain. | IL (Interaction Length), GTPA (Ground Truth Probability Above Threshold), DDR/DDP/DDF1 (Differential Diagnosis Recall/Precision/F1), DSP/DSR/DSF1 (Severity Precision/Recall/F1) |
NarrativeQA | QA & Discourse Understanding | An English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents. | BLEU, MET, ROU |
Date | Venue | Title | Homepage | Domain |
---|---|---|---|---|
2023-08-23 | SIGIR 2024 | YAGO 4.5: A Large and Clean Knowledge Base with a Rich Taxonomy | Yes | General |
2023-02-09 | Bioinformatics | The scalable precision medicine open knowledge engine (spoke): a massive knowledge graph of biomedical information | Yes | Biomedical |
2018-11-22 | Nucleic acids research | String v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets | Yes | Protein-protein interaction prediction |
2018-05-12 | LREC workshop | Lynx: building the legal knowledge graph for smart compliance services in multilingual europe | Yes | Legal |
- [1] A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models.
- [2] In-depth Analysis of Graph-based RAG in a Unified Framework.