Skip to content

SJTU-DMTai/awesome-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Awesome RAG Papers

This is a repo contains a list of papers about RAG, especially RAG with Knowledge Graphs

Introduction

Large language models (LLMs) have demonstrated impressive reasoning abilities in complex tasks. However, they lack up-to-date knowledge and experience hallucinations during reasoning, which can lead to incorrect reasoning processes and diminish their performance and trustworthiness.

Recently, Retrieval-Augmented Generation (RAG) has achieved remarkable success in addressing the challenges of LLMs without necessitating retraining. By referencing an external knowledge base, RAG refines LLM outputs, effectively mitigating issues such as “hallucination”, lack of domain-specific knowledge, and outdated information. But in some practical scenarios, traditional RAG fails to capture significant structured relational knowledge, often recounts content in the form of text when concatenated as prompts and fails to grasp global information comprehensively.

Combining RAG with Knowledge Graphs (KGs) emerges as a promising solution to address these challenges. KGs can offer a structured and explicit representation of entities and relationships that are more accurate than retrieving information through vector similarity. Leveraging external structured knowledge graphs can improve contextual understanding of LLMs and generate more informed responses. The entire process typically contains three stages: Indexing, Retrieval and Generation. The overall pipeline is as follows. pipeline.png We collect the recent influential papers about RAG especially RAG with KGs. The following papers are listed in chronological order of publication.

Paper List

2024

Date Venue Title Code Description
2024-10-28 Arxiv Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation Yes This paper introduce the SubgraphRAG, extending the KG-based RAG framework that retrieves subgraphs and leverages LLMs for reasoning and answer prediction. It integrates a lightweight multilayer perceptron with a parallel triple-scoring mechanism for efficient and flexible subgraph retrieval while encoding directional structural distances to enhance retrieval effectiveness.
2024-10-23 Arxiv Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective No This work introduces Graphusion, a zero-shot KGC framework from free text. It contains three steps: in Step 1, extract a list of seed entities using topic modeling to guide the final KG includes the most relevant entities; in Step 2, conduct candidate triplet extraction using LLMs; in Step 3, design the novel fusion module that provides a global view of the extracted knowledge, incorporating entity merging, conflict resolution, and novel triplet discovery.
2024-10-08 Arxiv LightRAG: Simple and Fast Retrieval-Augmented Generation Yes This paper proposes a system that integrates graph structures into text indexing and retrieval to address the limitations of existing RAG systems in handling complex interdependencies between entities. The system includes an incremental update algorithm to ensure timely integration of new data, keeping the system effective in rapidly changing data environments.
2024-08-15 Arxiv Graph Retrieval-Augmented Generation: A Survey No This paper provides an overview of GraphRAG, a methodology that enhances language models by integrating knowledge graphs to improve retrieval accuracy and contextual responses. It details the GraphRAG workflow, from indexing to retrieval and generation, and discusses its applications and future research directions, highlighting its potential across various domains.
2024-08-09 Arxiv HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction No HybridRAG combines the Knowledge Graphs (KGs) based RAG techniques (called GraphRAG) and VectorRAG techniques to enhance question-answer (Q&A) systems for information extraction from financial documents that is shown to be capable of generating accurate and contextually relevant answers.
2024-07-20 Arxiv Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base No Golden-Retriever incorporates a reflection-based question augmentation step before document retrieval, which involves identifying jargon, clarifying its meaning based on context, and augmenting the question accordingly.
2024-05-26 Arxiv GRAG: Graph Retrieval-Augmented Generation No GRAG tackles the fundamental challenges in retrieving textual subgraphs and integrating the joint textual and topological information into Large Language Models (LLMs) to enhance its generation.
2024-05-23 Arxiv HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models Yes HippoRAG is a novel retrieval framework inspired by the hippocampal indexing theory of human long-term memory to enable deeper and more efficient knowledge integration over new experiences. HippoRAG synergistically orchestrates LLMs, knowledge graphs, and the Personalized PageRank algorithm to mimic the different roles of neocortex and hippocampus in human memory.
2024-05-20 Arxiv KG-RAG: Bridging the Gap Between Knowledge and Creativity Yes KG-RAG is a novel framework designed to enhance the knowledge capabilities of LMAs by integrating structured Knowledge Graphs (KGs) with the functionalities of LLMs, thereby significantly reducing the reliance on the latent knowledge of LLMs.
2024-05-13 Arxiv Evaluation of Retrieval-Augmented Generation: A Survey Yes This paper examine and compare several quantifiable metrics of the Retrieval and Generation components, such as relevance, accuracy, and faithfulness, within the current RAG benchmarks, encompassing the possible output and ground truth pairs. The paper also analyze the various datasets and metrics, discuss the limitations of current benchmarks, and suggest potential directions to advance the field of RAG benchmarks.
2024-05-10 KDD 24 A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models No This survey comprehensively review existing research studies in RA-LLMs, covering three primary technical perspectives: architectures, training strategies, and applications. As the preliminary knowledge, the authors briefly introduce the foundations and recent advances of LLMs. Then, to illustrate the practical significance of RAG for LLMs, the authors systematically review mainstream relevant work by their architectures, training strategies, and application areas, detailing specifically the challenges of each and the corresponding capabilities of RA-LLMs. Finally, to deliver deeper insights, the authors discuss current limitations and several promising directions for future research.
2024-05-08 EMNLP 2024 DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature Yes The purpose of DALK is to address this limitation and demonstrate its ability on studying Alzheimer's Disease (AD), a specialized sub-field in biomedicine and a global health priority
2024-04-26 SIGIR 2024 Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering No This paper introduce a novel customer service question-answering method that amalgamates RAG with a knowledge graph (KG).
2024-04-24 Arxiv From Local to Global: A Graph RAG Approach to Query-Focused Summarization Yes This paper propose a two-stage approach using a large language model (LLM) to build a graph-based text index: first, deriving an entity knowledge graph from source documents, and second, pre-generating community summaries for closely-related entities. When a question is posed, each community summary generates a partial response, which are then summarized into a final user response.
2024-04-10 Arxiv Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs Yes This paper construct a Graph Reasoning Benchmark dataset called GRBench, containing 1,740 questions that can be answered with the knowledge from 10 domain graphs. Then, the authors propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
2024-03-09 Arxiv KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques Yes KG-Rank leverages a medical knowledge graph (KG) along with ranking and re-ranking techniques to improve the factuality of long-form question answering (QA) in the medical domain.
2024-02-19 Arxiv Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge No This paper introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem. Its retrieval performance is about twice better than embedding similarity alternatives on both precision and recall.
2024-02-12 Arxiv G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering Yes G-Retriever is a method designed for textual graph understanding and question answering. It enables users to interact with a graph through a conversational interface, asking questions and receiving textual replies along with highlighted relevant parts of the graph. The approach integrates Graph Neural Networks (GNNs), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG) to enhance understanding of graphs through soft prompting.
2024-02-06 ICML 24 DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton No DFA-RAG is a novel framework designed to enhance the capabilities of conversational agents using large language models (LLMs).

2023

Date Venue Title Code Description
2023-12-18 Arxiv Retrieval-Augmented Generation for Large Language Models: A Survey No The survey outlines the evolution of RAG through three paradigms: Naive RAG, Advanced RAG, and Modular RAG. It meticulously examines the three foundational components of RAG frameworks, which include retrieval, generation, and augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these components, offering a profound understanding of the advancements in RAG systems.
2023-12-11 Arxiv KnowGPT: Knowledge Graph based Prompting for Large Language Models No This paper introduce a novel Knowledge Graph based PrompTing framework, namely KnowGPT, to enhance LLMs with domain knowledge. KnowGPT contains a knowledge extraction module to extract the most informative knowledge from KGs, and a context-aware prompt construction module to automatically convert extracted knowledge into effective prompts.
2023-12-05 Arxiv Large Language Models on Graphs: A Comprehensive Survey No This paper provide a systematic review of scenarios and techniques related to large language models on graphs.
2023-10-17 ICLR 24 Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Yes The framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements.

Benchmark

Date Venue Title Code Description
2024-06-07 NeurIPS 2024 CRAG -- Comprehensive RAG Benchmark Yes CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds
2024-04-10 Arxiv Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs Yes This paper construct a Graph Reasoning Benchmark dataset called GRBench, containing 1,740 questions that can be answered with the knowledge from 10 domain graphs.
2024-03-03 SIGIR 24 CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring Commonsense Reasoning and Long-Tail Knowledge Yes In this work, the authors seek a novel KGQA dataset that supports commonsense reasoning and focuses on long-tail entities (e.g., non-mainstream and recent entities) where LLMs frequently hallucinate, and thus create the need for novel methodologies that leverage the KG for factual and attributable commonsense inference.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published