Agentic Information Retrieval

Weinan Zhang, Junwei Liao, Ning Li, and Kounianhua Du

https://arxiv.org/html/2410.09713

Abstract

Agentic Information Retrieval (Agentic IR)

Background:

Since 1970s: domain-specific architectures for information retrieval
Improvements with modern IR systems and web search engines
Core paradigm unchanged: filtering predefined candidate items
Introduction of large language models (LLMs) in 2022 transforming information access
New technical paradigm for Agentic IR

Agentic IR Overview:

Expands scope of accessible tasks
Leverages techniques to redefine information retrieval

Applications:

Cutting-edge applications: TBA
Central information entry point in future digital ecosystems

Challenges: TBD (To Be Determined)

Discussion:

Agentic IR shaped by capabilities of LLM agents
Transforms how information is accessed
New paradigm for information retrieval.

1 The Trends of IR

Information Retrieval (IR)

Refers to tasks or techniques of finding information items matching user's needs from a large corpus
Wide range of applications: web search, recommendation systems, online services

Traditional IR Architecture:

Employs specialized architecture for retrieving, ranking, and selecting information items based on query
Web search engines use inverted index system to maintain posting list of documents for each term
Given a query, candidate documents are retrieved using the inverted index and ranked using a scoring function
Top-ranked documents presented on SERP

Personalized Recommender Systems:

Involve retrieval, pre-ranking (optional), ranking, and re-ranking stages to filter items and present top recommendations to user

Limitations of Traditional IR:

Predefined architecture with fixed information flow
Difficult to perform interactive or complex tasks
User unable to manipulate information items during the IR process

Agentic Information Retrieval (Agentic IR):

Novel paradigm for next-generation IR techniques
Differentiated aspects:
- Task scope: agent takes actions to reach user's desired information state
- Architecture: unified architecture employing AI agent across various scenarios
- Key methods: prompt engineering, retrieval-augmented generation, fine-tuning with supervised and reinforcement learning, multi-agent systems

Formal Presentation of Agentic IR:

Task formulation
Architecture form
Key methods

Applications of Agentic IR:

Life assistant
Business assistant
Coding assistant

Challenges in Agentic IR:

To be discussed in Section 4

Conclusion:

Introducing the concept of agentic IR as a next-generation IR architecture for more complex tasks.

2 Agentic IR

2.1 Task Formulation

Core Components

s* = target information state (desired result)
x(s*) = user's instruction text
π(at|x(st)) = agent's policy function
st = state at time t
at = action at time t

Process

User inputs target description
Agent takes actions via policy
Environment transitions: p(st+1|st,at)
Terminates at state sT
Success measured by r(s,sT)*

Objective

Maximize: maxπ 𝔼s[r(s*,sT)]*
Subject to state/action transitions from t=1 to T-1

2.2 Architecture

Agent Policy (π)

Conditional on user's language instruction: x(st)𝑥subscript𝑠𝑡subscriptℎ𝑡\pi(a_t|x(s_t))
Interacts with environment with single or multiple turns
Results in information state: x(st)𝑥subscript𝑠𝑡

Inner Architecture Modules

Memory: stored history and experiences
- Log history, experience
- Stored in disk
Thought: information in context window of LLM

External Tools

Function that cannot be replaced by neural net model
- Web search engine, relational DB, real-time weather app, calculator, etc.

Textual Description of Information State (x(st))

Depends on current state st, memory ht, and thought Tht, tool Tool: x(st) = g(st, ht, Mem, Tht, Tool)
- Mem, Tht, Tool update memory, manipulate thoughts, call tools, respectively

Composite Function (g)

Takes current state st and memory ht as raw input
Outputs intermediate representation of state st for further processing by LLM

Design Determinants

Specific design of g directly determines agent architecture along with used LLM.

Framework Instantiation

Architecture built in a unified way using DAAG over three functions.
- Previous study: Christianos et al. (2023)

2.3 Key Methods

Improving Agentic Information Retrieval (IR)

Key Methods:

Prompt engineering: setting input to enable task performance (Liu et al., 2023)
- Human-controllable way for hidden state
- Chain-of-thought prompting
Retrieval-augmented generation (RAG): using demonstrations to refine actions and information states (Zhou et al., 2024)
- Demonstrations on action level or thought level
Reflection: learning from failures to update thoughts for better interactions (Shinn et al., 2024)

Fine-tuning Methods:

Supervised fine-tuning (SFT): adapting LLMs to agentic IR tasks using successful historic trajectories as training data (Liu et al., 2023)
- Behavioral cloning imitation learning methods in RL
- Does not directly optimize objective
Preference learning: fine-tuning LLMs based on preference objective over a pair of outputs (Rafailov et al., 2024)
- Similar to pairwise learning to rank techniques in traditional IR
Reinforcement fine-tuning (RFT): optimizing objective with reward signal from environment or human feedbacks (Schulman et al., 2017; Silver et al., 2018)
- Requires larger computational resources for exploration and updates

Advanced Methods:

Complex reasoning: performing task planning and complex reasoning before taking actions (OpenAI, 2014)
- Strong reasoner for improving agent's performance
Reward modeling: crucial to enable RFT or search-based decoding techniques (Uesato et al., 2022; Luo et al., 2024)
- Outcome reward models and process reward models are essential modules for high-performance math agents
Multi-agent systems (MAS): containing multiple homogeneous or heterogeneous agents that manage to coordinate and achieve collective intelligence (Chen et al., 2023; Li et al., 2024a)

3 Application Scenarios and Case Studies

Brief Discussion of Three Types of Applications: Life Assistant, Business Assistant, and Coding Assistant

Life Assistant: Agent Information Retrieval (IR) functions as an autonomous assistant for users in this application.
Business Assistant: Similar to the life assistant, the IR acts as an autonomous assistant for users in a business setting.
Coding Assistant: The IR operates autonomously to assist users in coding tasks.
Traditional Information Retrieval: A non-autonomous tool that is used to call upon agent IR.

3.1 Life Assistant

Life Assistants: Evolution and Agentic Information Retrieval (IR)

Background:

Voice-activated tools transformed into sophisticated systems
Significant advancement in IR technologies

Agentic IR:

Empowers life assistants to gather, deliver info & proactively support tasks
Understands user needs, context, preferences
Acts as active, autonomous agents adapting to lifestyle

Applications:

Apple Intelligence: enhances user experience, seamlessly integrates with devices and services (Apple)
Google Assistant, Amazon Alexa, Oppo Breeno, Huawei Celia: operate across diverse platforms (Google Assistant, Amazon Alexa, Oppo Breeno,)

Benefits:

Convenient control over digital and physical environments
Proactive, contextual assistance

Scenario: Jane's Daily Life with Agentic IR

Anticipates needs & gathers info: traffic conditions, suggests earlier departure time
Modular design: memory (context), manipulate thought (processing preferences), tools (external sources)
Adaptation: refines understanding from explicit queries and passive contextual cues
Autonomous task execution: books a dinner reservation or sets reminders
Seamless integration across devices and services: unifies various applications, ensures alignment between physical environment and personal schedule.

Agentic IR Characteristics:

Proactive information gathering & state transition
Adaptive through contextual understanding & interactive refinement
Autonomous task execution & final information states
Seamless integration across devices and services.

3.2 Business Assistant

Business Assistant for Enterprise Users:

Designed to support business knowledge and insights from various documents and data sources
Uses agentic IR capabilities for intention recognition and response generation
Addresses wide range of business queries: financial analysis, marketing strategies, decision making
Four stages in the workflow: query understanding, document retrieval, information integration, response generation

Query Understanding:

Attempts to understand user's intention from a business-related query
Generates thoughts with CoT for complex queries and multi-step reasoning
Leverages historical dialogues as memory to better understand context and intent

Document Retrieval:

Retrieves relevant information from external and internal documents
Utilizes tools like OCR, SQL for diverse document formats
Semantic search capabilities ensure alignment with query intent

Information Integration:

Combines and condenses scattered information before responding to the query
Uses thoughts or tools to generate cohesive responses or complete tasks
RAG framework by default in systems like Amazon Q Business for response generation

Response Generation:

Generates a response in various formats: plain text, tables, visualized charts
Completes tasks and returns action states
Links answer back to source documents for transparency

Application of Business Assistant:

Continuous evolution with advancements in agentic IR and increasing market demand
Enhanced contextual understanding and multi-step reasoning for complex instructions
Retrieves information from ever-updating sources in continuous business scenarios
Security concerns include protection of internal enterprise data and ensuring safe responses.

3.3 Coding Assistant

Interactive Programming Assistance and Automatic Program Synthesis

Productivity and development efficiency improved by:
- Copilot: interactive environment for developers to gather information from open world and meet programming needs (GitHub)
- Agentic Information Retrieval: systems designed to autonomously retrieve and provide relevant information based on developer queries and contextual needs

Developer-Coding Assistant Interaction Process

Information Need Diagnosis:
- Developers' information need can be:
  - Conscious (explicitly input requirements)
  - Unconscious (automatically identified by the coding assistant)
- Agentic IR offers timely and tailored knowledge assistance due to:
  - Memory module that remembers previous interactions, preferences, queries, debugging histories, and coding projects
Knowledge Content Generation:
- After information need is identified, the coding assistant queries for corresponding knowledge content using an intelligent large language model (e.g., OpenAI CodeX)
- Integrated with various coding tools (debuggers, compilers, linters) to provide reliable and non-parametric knowledge
- Examples: generating codes, synthesizing code completions, test generation, compiler feedback
Information State Update:
- Developer perceives the generated knowledge content and refines their work, leading to an updated information state
- A new round of interaction is then activated, with developers gathering timely, tailored, and evolving information from the coding assistant
Qualified Code or Project Accomplishment:
- Developers reach a final information state (sT) where they have accomplished a qualified code or project

4 Challenges

Challenges in Agentic Interactive Reasoning (IR)

Data Acquisition:

Logged data from agent's interaction with environment
Determined by user instructions, agent policy, and environment dynamics
Exploration-exploitation tradeoff crucial for high-quality data collection
Direct labeling of correct trajectories expensive and challenging

Model Training:

Agent policy consists of DAG of functions: memory update, thought manipulation, tool use
Effectively updating parameters of these functions and composite policy function highly challenging
Recent attempts using RFT (Christianos et al., 2023) and action decomposition (Wen et al., 2024)

Inference Cost:

Large parameter size and autoregressive nature increase GPU requirements and processing time for LLMs
System optimization crucial for practical service deployment

Safety:

Agent directly interacts with real environment, changing it and user's information states
Important to guarantee safety across user journey
Alignment techniques (Ji et al., 2023) helpful but not guaranteed
Proposed "world model + verifier" framework (Dalrymple et al., 2024) can explore safety for agentic IR

Interacting with Users:

Product form of agentic IR still under-explored due to differences from traditional IR in aspects like inference latency, data manipulation, and information state representation.

5 Conclusions

Proposed new paradigm in IR: Agentic IR
Differentiates from traditional IR by interacting with environment to reach user's target information state
Agentic IR serves a wide task scope, uses unified agent architecture, and employs distinct key methods compared to traditional IR
Challenges exist but expected development and promotion in upcoming years

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agentic-Information-Retrieval.md

Agentic-Information-Retrieval.md

Agentic Information Retrieval

Abstract

1 The Trends of IR

2 Agentic IR

2.1 Task Formulation

2.2 Architecture

2.3 Key Methods

3 Application Scenarios and Case Studies

3.1 Life Assistant

3.2 Business Assistant

3.3 Coding Assistant

4 Challenges

5 Conclusions

Files

Agentic-Information-Retrieval.md

Latest commit

History

Agentic-Information-Retrieval.md

File metadata and controls

Agentic Information Retrieval

Abstract

1 The Trends of IR

2 Agentic IR

2.1 Task Formulation

2.2 Architecture

2.3 Key Methods

3 Application Scenarios and Case Studies

3.1 Life Assistant

3.2 Business Assistant

3.3 Coding Assistant

4 Challenges

5 Conclusions