Weinan Zhang, Junwei Liao, Ning Li, and Kounianhua Du
https://arxiv.org/html/2410.09713
Agentic Information Retrieval (Agentic IR)
Background:
- Since 1970s: domain-specific architectures for information retrieval
- Improvements with modern IR systems and web search engines
- Core paradigm unchanged: filtering predefined candidate items
- Introduction of large language models (LLMs) in 2022 transforming information access
- New technical paradigm for Agentic IR
Agentic IR Overview:
- Expands scope of accessible tasks
- Leverages techniques to redefine information retrieval
Applications:
- Cutting-edge applications: TBA
- Central information entry point in future digital ecosystems
Challenges: TBD (To Be Determined)
Discussion:
- Agentic IR shaped by capabilities of LLM agents
- Transforms how information is accessed
- New paradigm for information retrieval.
Information Retrieval (IR)
- Refers to tasks or techniques of finding information items matching user's needs from a large corpus
- Wide range of applications: web search, recommendation systems, online services
Traditional IR Architecture:
- Employs specialized architecture for retrieving, ranking, and selecting information items based on query
- Web search engines use inverted index system to maintain posting list of documents for each term
- Given a query, candidate documents are retrieved using the inverted index and ranked using a scoring function
- Top-ranked documents presented on SERP
Personalized Recommender Systems:
- Involve retrieval, pre-ranking (optional), ranking, and re-ranking stages to filter items and present top recommendations to user
Limitations of Traditional IR:
- Predefined architecture with fixed information flow
- Difficult to perform interactive or complex tasks
- User unable to manipulate information items during the IR process
Agentic Information Retrieval (Agentic IR):
- Novel paradigm for next-generation IR techniques
- Differentiated aspects:
- Task scope: agent takes actions to reach user's desired information state
- Architecture: unified architecture employing AI agent across various scenarios
- Key methods: prompt engineering, retrieval-augmented generation, fine-tuning with supervised and reinforcement learning, multi-agent systems
Formal Presentation of Agentic IR:
- Task formulation
- Architecture form
- Key methods
Applications of Agentic IR:
- Life assistant
- Business assistant
- Coding assistant
Challenges in Agentic IR:
- To be discussed in Section 4
Conclusion:
- Introducing the concept of agentic IR as a next-generation IR architecture for more complex tasks.
Core Components
- s* = target information state (desired result)
- x(s*) = user's instruction text
- π(at|x(st)) = agent's policy function
- st = state at time t
- at = action at time t
Process
- User inputs target description
- Agent takes actions via policy
- Environment transitions: p(st+1|st,at)
- Terminates at state sT
- Success measured by r(s,sT)*
Objective
- Maximize: maxπ 𝔼s[r(s*,sT)]*
- Subject to state/action transitions from t=1 to T-1
Agent Policy (π)
- Conditional on user's language instruction: x(st)𝑥subscript𝑠𝑡subscriptℎ𝑡\pi(a_t|x(s_t))
- Interacts with environment with single or multiple turns
- Results in information state: x(st)𝑥subscript𝑠𝑡
Inner Architecture Modules
- Memory: stored history and experiences
- Log history, experience
- Stored in disk
- Thought: information in context window of LLM
External Tools
- Function that cannot be replaced by neural net model
- Web search engine, relational DB, real-time weather app, calculator, etc.
Textual Description of Information State (x(st))
- Depends on current state st, memory ht, and thought Tht, tool Tool: x(st) = g(st, ht, Mem, Tht, Tool)
- Mem, Tht, Tool update memory, manipulate thoughts, call tools, respectively
Composite Function (g)
- Takes current state st and memory ht as raw input
- Outputs intermediate representation of state st for further processing by LLM
Design Determinants
- Specific design of g directly determines agent architecture along with used LLM.
Framework Instantiation
- Architecture built in a unified way using DAAG over three functions.
- Previous study: Christianos et al. (2023)
Improving Agentic Information Retrieval (IR)
Key Methods:
- Prompt engineering: setting input to enable task performance (Liu et al., 2023)
- Human-controllable way for hidden state
- Chain-of-thought prompting
- Retrieval-augmented generation (RAG): using demonstrations to refine actions and information states (Zhou et al., 2024)
- Demonstrations on action level or thought level
- Reflection: learning from failures to update thoughts for better interactions (Shinn et al., 2024)
Fine-tuning Methods:
- Supervised fine-tuning (SFT): adapting LLMs to agentic IR tasks using successful historic trajectories as training data (Liu et al., 2023)
- Behavioral cloning imitation learning methods in RL
- Does not directly optimize objective
- Preference learning: fine-tuning LLMs based on preference objective over a pair of outputs (Rafailov et al., 2024)
- Similar to pairwise learning to rank techniques in traditional IR
- Reinforcement fine-tuning (RFT): optimizing objective with reward signal from environment or human feedbacks (Schulman et al., 2017; Silver et al., 2018)
- Requires larger computational resources for exploration and updates
Advanced Methods:
- Complex reasoning: performing task planning and complex reasoning before taking actions (OpenAI, 2014)
- Strong reasoner for improving agent's performance
- Reward modeling: crucial to enable RFT or search-based decoding techniques (Uesato et al., 2022; Luo et al., 2024)
- Outcome reward models and process reward models are essential modules for high-performance math agents
- Multi-agent systems (MAS): containing multiple homogeneous or heterogeneous agents that manage to coordinate and achieve collective intelligence (Chen et al., 2023; Li et al., 2024a)
Brief Discussion of Three Types of Applications: Life Assistant, Business Assistant, and Coding Assistant
- Life Assistant: Agent Information Retrieval (IR) functions as an autonomous assistant for users in this application.
- Business Assistant: Similar to the life assistant, the IR acts as an autonomous assistant for users in a business setting.
- Coding Assistant: The IR operates autonomously to assist users in coding tasks.
- Traditional Information Retrieval: A non-autonomous tool that is used to call upon agent IR.
Life Assistants: Evolution and Agentic Information Retrieval (IR)
Background:
- Voice-activated tools transformed into sophisticated systems
- Significant advancement in IR technologies
Agentic IR:
- Empowers life assistants to gather, deliver info & proactively support tasks
- Understands user needs, context, preferences
- Acts as active, autonomous agents adapting to lifestyle
Applications:
- Apple Intelligence: enhances user experience, seamlessly integrates with devices and services (Apple)
- Google Assistant, Amazon Alexa, Oppo Breeno, Huawei Celia: operate across diverse platforms (Google Assistant, Amazon Alexa, Oppo Breeno,)
Benefits:
- Convenient control over digital and physical environments
- Proactive, contextual assistance
Scenario: Jane's Daily Life with Agentic IR
- Anticipates needs & gathers info: traffic conditions, suggests earlier departure time
- Modular design: memory (context), manipulate thought (processing preferences), tools (external sources)
- Adaptation: refines understanding from explicit queries and passive contextual cues
- Autonomous task execution: books a dinner reservation or sets reminders
- Seamless integration across devices and services: unifies various applications, ensures alignment between physical environment and personal schedule.
Agentic IR Characteristics:
- Proactive information gathering & state transition
- Adaptive through contextual understanding & interactive refinement
- Autonomous task execution & final information states
- Seamless integration across devices and services.
Business Assistant for Enterprise Users:
- Designed to support business knowledge and insights from various documents and data sources
- Uses agentic IR capabilities for intention recognition and response generation
- Addresses wide range of business queries: financial analysis, marketing strategies, decision making
- Four stages in the workflow: query understanding, document retrieval, information integration, response generation
Query Understanding:
- Attempts to understand user's intention from a business-related query
- Generates thoughts with CoT for complex queries and multi-step reasoning
- Leverages historical dialogues as memory to better understand context and intent
Document Retrieval:
- Retrieves relevant information from external and internal documents
- Utilizes tools like OCR, SQL for diverse document formats
- Semantic search capabilities ensure alignment with query intent
Information Integration:
- Combines and condenses scattered information before responding to the query
- Uses thoughts or tools to generate cohesive responses or complete tasks
- RAG framework by default in systems like Amazon Q Business for response generation
Response Generation:
- Generates a response in various formats: plain text, tables, visualized charts
- Completes tasks and returns action states
- Links answer back to source documents for transparency
Application of Business Assistant:
- Continuous evolution with advancements in agentic IR and increasing market demand
- Enhanced contextual understanding and multi-step reasoning for complex instructions
- Retrieves information from ever-updating sources in continuous business scenarios
- Security concerns include protection of internal enterprise data and ensuring safe responses.
Interactive Programming Assistance and Automatic Program Synthesis
- Productivity and development efficiency improved by:
- Copilot: interactive environment for developers to gather information from open world and meet programming needs (GitHub)
- Agentic Information Retrieval: systems designed to autonomously retrieve and provide relevant information based on developer queries and contextual needs
Developer-Coding Assistant Interaction Process
- Information Need Diagnosis:
- Developers' information need can be:
- Conscious (explicitly input requirements)
- Unconscious (automatically identified by the coding assistant)
- Agentic IR offers timely and tailored knowledge assistance due to:
- Memory module that remembers previous interactions, preferences, queries, debugging histories, and coding projects
- Developers' information need can be:
- Knowledge Content Generation:
- After information need is identified, the coding assistant queries for corresponding knowledge content using an intelligent large language model (e.g., OpenAI CodeX)
- Integrated with various coding tools (debuggers, compilers, linters) to provide reliable and non-parametric knowledge
- Examples: generating codes, synthesizing code completions, test generation, compiler feedback
- Information State Update:
- Developer perceives the generated knowledge content and refines their work, leading to an updated information state
- A new round of interaction is then activated, with developers gathering timely, tailored, and evolving information from the coding assistant
- Qualified Code or Project Accomplishment:
- Developers reach a final information state (sT) where they have accomplished a qualified code or project
Challenges in Agentic Interactive Reasoning (IR)
Data Acquisition:
- Logged data from agent's interaction with environment
- Determined by user instructions, agent policy, and environment dynamics
- Exploration-exploitation tradeoff crucial for high-quality data collection
- Direct labeling of correct trajectories expensive and challenging
Model Training:
- Agent policy consists of DAG of functions: memory update, thought manipulation, tool use
- Effectively updating parameters of these functions and composite policy function highly challenging
- Recent attempts using RFT (Christianos et al., 2023) and action decomposition (Wen et al., 2024)
Inference Cost:
- Large parameter size and autoregressive nature increase GPU requirements and processing time for LLMs
- System optimization crucial for practical service deployment
Safety:
- Agent directly interacts with real environment, changing it and user's information states
- Important to guarantee safety across user journey
- Alignment techniques (Ji et al., 2023) helpful but not guaranteed
- Proposed "world model + verifier" framework (Dalrymple et al., 2024) can explore safety for agentic IR
Interacting with Users:
- Product form of agentic IR still under-explored due to differences from traditional IR in aspects like inference latency, data manipulation, and information state representation.
- Proposed new paradigm in IR: Agentic IR
- Differentiates from traditional IR by interacting with environment to reach user's target information state
- Agentic IR serves a wide task scope, uses unified agent architecture, and employs distinct key methods compared to traditional IR
- Challenges exist but expected development and promotion in upcoming years