Jeevana Priya Inala, Chenglong Wang, Steven Drucker, Gonzalo Ramos, Victor Dibia, Nathalie Riche, Dave Brown, Dan Marshall, Jianfeng Gao https://arxiv.org/pdf/2409.18475?
- Abstract
- 1. Introduction
- 2. Background
- 3. Case study: User experiences with AI-tools for visualization authoring
- 4. Opportunities for AI systems in the data analysis domain
- 5. Human-driven design considerations for AI-based data analysis systems
- 6. Challenges in developing AI powered data analysis systems
- 7. Conclusion
AI-Powered Data Analysis: A New Era of Opportunities
Exploring AI's Role in Data Analysis:
- AI tools can reshape data analysis by enhancing various stages of the workflow
- The emergence of large language and multimodal models offers new opportunities to translate high-level user intentions into executable code, charts, and insights
Human-Centered Design Principles:
- Facilitate intuitive interactions with AI systems
- Build user trust through clear explanations and expectations
- Streamline the analysis workflow across multiple apps
Research Challenges:
- Enhancing model capabilities to better understand complex data and user needs
- Evaluating and benchmarking AI systems to ensure accuracy and efficiency
- Understanding end-user needs and designing interfaces that cater to their diverse requirements
Data-Driven Decisions: Democratizing Data Analysis
Introduction:
- Importance of data analysis in various industries and daily life
- Current high cost of data analysis limits accessibility to the general population
- Potential benefits of democratizing data analysis
Challenges of Data Analysis:
- Complex process requiring multiple steps:
- Task formulation
- Data collection
- Exploratory analysis
- Creating visual representations
- Documentation as reports
- Analysts need various skills and expertise:
- Conceptual knowledge (data sensemaking, domain, statistics, visualization design)
- Tool expertise and programming skills
- Overhead switching between tools and managing branching analysis steps
- Existing systems trade-off between flexibility and ease of use
- Complexities in data analysis make it a unique challenge for AI models
- Iterative and multimodal nature of data analysis requires user involvement
- Importance of accuracy and reliability in AI systems for sensitive domains like healthcare or finance
- Need for interfaces that facilitate effective human-AI collaboration
- Research challenges ahead to implement AI-powered data analysis systems:
- Enhancing existing models' capabilities
- Addressing scarcity of training and evaluation data
- Ensuring system reliability and stability
- Conducting user research for alignment with users' cognitive abilities and practical needs.
Data Analysis Process
- Figure 1 and Table 2 illustrate steps involved in gaining insights from data
- Iterative process: multiple steps, back and forth
- Starts with task formulation: identifying problem/question (e.g., renewable energy trend analysis)
- Involves operationalizing the task into sub-tasks (global trends, country ranking, comparison of trends)
- Collecting relevant data: querying databases, instrumenting apps, web scraping, cleaning, integration from multiple sources
- Data exploration phase: gaining familiarity with dataset, identifying potential patterns or relationships
- Descriptive statistics (mean, median, count)
- Visualizing variable relationships through trends and correlations
- Formulating hypotheses to explore further
- Transforming data and generating visualization artifacts for hypothesis support
- Validating insights: ensuring reliability and accuracy of findings through statistical analysis, domain knowledge, or external sources
- Refining hypotheses based on validated insights
- Final results: creating reports, dashboards, or presentations to communicate findings to stakeholders
- Iteratively refining communication based on feedback received
LLMs, LMMs, and AI Systems
- Advancements in language models (LLMs) and multimodal models (LMMs)
- Foundation models: ChatGPT, GPT4, Claude, Llama, Phi-3-Vision, GitHub copilot, Dibia, Narayan et al.
- Capable of generating code, understanding algorithms, and solving interview/competition problems
- Generating data transformations and visualization artifacts
- Instruction-tuned to understand and converse with people in a chat environment (Ouyang et al.)
- Agents: interfaces around foundation models, embedding application-specific knowledge through prompts or external tools
- Code interpreter: generates code, executes it in a sandbox environment, produces further text or code
- Multi-agent systems: series of agents collaborating and responding to user queries (Wu et al., Hong et al.)
- "AI systems": refers to systems powered by latest breakthroughs in LLMs, LMMs, and Generative AI.
Case Study: User Experiences with AI-Tools for Visualization Authoring
Data Visualizations:
- Prevalent in data analysis stages
- Used by data analysts to:
- Assess data quality issues
- Explore relations between data fields
- Understand data trends and statistical attributes
- Communicate insights from data to the audience
Challenges of Visualization Authoring:
- Analysts need to learn visualization tools
- Transform data into the right format for chart design
- AI-powered visualization authoring tools aim to reduce this "authoring barrier"
Comparison of User Experiences:
- Traditional programming tools
- Direct conversational interaction with LLMs via chat-based interface
- LLM-powered interactive data-analysis tools
User Experience Comparison Factors:
- Specifying the Initial Intent:
- Consuming AI System's Output:
- Editing:
- Iterating:
- Verifying AI System's Output:
- Workflow in Data Analysis Context
Task: Investigate trend of renewable electricity percentage over time for top 5 CO2 emission countries
Traditional Approach (Non-AI):
- Data Transformation:
- Create new columns: "percentage of renewable electricity"
- Perform operations like grouping, summing, filtering
- Complex tasks may require pivoting or unpivoting
- Chart Authoring:
- Determine chart type and encodings to represent data trends appropriately
- Create charts using libraries such as Seaborn, Matplotlib, PowerBI, Tableau
- Challenges:
- Steep learning curve for beginner data analysts and non-programming end-users.
Conversational LLM Interface: ChatGPT with GPT-4o and CodeInterpreter
Benefits of Conversational LLMs:
- Simplifies the process for users with varying experience levels
- Allows natural language commands to express intent
- Performs steps such as loading dataset, understanding subtasks, writing code, and rendering final chart
- Provides explanations and code snippets for user understanding
Limitations of Conversational LLMs:
- Requires multiple model invocations, leading to long wait times
- Increases likelihood of failure due to repeated invocations
- Difficult to recover from system getting stuck on a step
- Limited expressiveness in natural language intentions (e.g., color preferences)
- Inefficient workflow when using multiple apps with separate copilots
Conversational Framework:
- Allows users to edit and iterate through follow-up instructions
- Chat interface allows backtracking and editing from previous conversations
- Users can't track different iterations easily
- System may lose context upon backtracking, requiring recomputation
- Long wait times and unreliable user experiences due to non-deterministic underlying model
Comparative Analysis:
- ChatGPT vs Data Formulator: Figure 2 shows the user experience for providing inputs and output formats
- Conversational editing vs. dynamic intent-based UIs in DyNaVis: Figure 3(a) compares conversational editing with data threads in Data Formulator
- Trust and verification of AI outputs: Figure 4 contrasts ChatGPT's explanation feature with Data Formulator's approach.
Data Formula-tor
- Multi-modal UI: allows users to specify intent using a combination of field encodings and natural language instructions
- LLM model: generates necessary data transformations based on user inputs, creates Vega-Lite specification for chart rendering
- Chart options: users can read/edit code, view transformed data table
DynaVis
- Multi-modal UI: allows users to specify chart editing intents through natural language
- Dynamic widget generation: system generates widgets based on high-level intent, reduces latency in the AI process
- Intent exploration: users can experiment with various options and receive instant feedback
Data Formulator2
- Iterative exploration: allows users to modify existing visualizations through multi-modal UI
- Data threads: help manage analysis session by organizing user interactions
- Candidate chart/transformation creation: enables users to inspect multiple candidates, disambiguate specifications
- Streamlined workflow: combines data transformations and visualization authoring into a single tool
UFO
- UI-focused agent: tailored for applications on Windows OS using GPT-Vision
- Seamless navigation and operation: employs models to observe and analyze GUI/control information
- Simplifies complex tasks: streamlines the data analysis workflow.
Comparative Analysis of AI Tools for Visualization Authoring
Observations:
- Design of tools impacts users' ability to create desired visualizations
- Critical for visualization authoring as well as other stages of data analysis
Design Considerations:
- Section 5 will elaborate on these design principles and explore strategies beyond those examined in the case study
- To alleviate user specification burdens, enhance trust and verification strategies, and facilitate workflows across multiple stages and tools
Additional AI Opportunities:
- Section 4 will investigate additional AI opportunities within the broader data analysis landscape, extending beyond visualization authoring.
Data Analysis Opportunities for AI Systems
Challenges of Data Analysis Workflows:
- Complex reasoning tasks
- Iterative processes
- Diverse skills required: coding, domain knowledge, scientific methodologies
Opportunities for AI Systems:
- Streamlining and enhancing workflows:
- Automating repetitive tasks
- Enhancing complex reasoning
- Identifying areas of impact:
- Dissecting the analysis process into sub-tasks
- Evaluating intermediate outputs
- Establishing user requirements:
- Reliability and effectiveness of AI systems
- Literature review:
- Preliminary overview for future research
- Limitations:
- Not exhaustive opportunities or related works
Goal:
- To explore how AI can help navigate data analysis complexities.
Closing the Skills Gap: Empowering Users in Data Analysis
Challenges:
- Diverse skill set required for effective data analysis
- Non-experts face barriers to engagement due to lack of skills
Opportunities with Low/No Code Experiences:
- Alleviating coding burden for non-programmers
- LLMs generate code snippets for various sub-stages
- Improving data analysis tools and platforms
- Natural language interfaces
- Offering domain knowledge support and statistical expertise
- Automating data cleaning, transformation, querying, and visualization
- Enhancing user experience with interactive decision support
Examples of LLMs for Data Analysis:
- LIDA: Goal-based data visualization generation (open-source)
- ChatGPT: Code interpreter API for generating plots from natural language queries
- Chat2Vis, ChartGPT: Generating code snippets for data analysis tasks
- DataFormulator: Allows users to specify visualization intent through drag and drop or natural language
- Statistical Assistance
- Helping users select appropriate statistical tests based on tasks and data
- Avoiding common pitfalls in statistical analysis
- Domain Knowledge Support
- Automating domain-specific contextual understanding of the data and task
- Tool Copilots
- Lowering entry barrier for engaging in data analysis tasks with tools
- Intelligent tutors guiding users to familiarize themselves with complex software interfaces
- Extending AI assistance to a broader range of applications
- Design Considerations: Enabling OS-Agent communication between apps for a more streamlined experience.
Data Analysis Process Stages and AI Workflows
Potential AI workflows for different stages of data analysis:
1. Task Formulation
- Refine high-level goals into specific tasks
- Address cold-start problem with domain knowledge
- LIDA's goal explorer module generates hypotheses (Dibia, 2023)
- InsightPilot issues concrete analysis actions (Ma et al., 2023)
- NL4DV Toolkit breaks down natural language queries into attributes, tasks, and visualizations (Narechania et al., 2020)
- Address cold-start problem with domain knowledge
- Draw inspiration from existing examples for new tasks
- Identifying similar datasets and questions (Bako et al., 2022)
- Retrieval Augmented Generation (RAG) approaches use external knowledge (Lewis et al., 2020; Izacard et al., 2022; Peng et al., 2023; Lu et al., 2024)
- Few-shot learning with similar examples (Poesia et al., 2022; Li et al., 2023b; Liu et al., 2021)
Reducing Biased Data Analysis with AI Systems
- Multiverse analysis: examining various analytical approaches to reduce risk of biased conclusions
- Boba (Liu et al., 2020): early pre-LLM system for simplifying multiverse analysis in data science
- Automated hypothesis exploration: beneficial but poses significant challenges due to potential bias and exponential hypothesis space
- Execution + Authoring: generating code for data transformations and configuring charts for visualization
- Validation, insight generation, and refinement: inspecting analysis artifacts to validate assumptions, extract insights, and iterate hypotheses
- Vision language models (LLMs): analyzing images, charts, and figures to reason about visualizations and generate insights
- User verification strategies: users may need to resolve ambiguities or misunderstandings in AI-generated output
Data Discovery with AI Systems
- Assumption of accessible data: not always the case as new analysis may require additional data
- Dataset location: AI systems can help users locate necessary datasets for deeper, more accurate analysis
- LLMs for semantic search in data lakes: extending beyond traditional keyword/tag-based search methods
- Interacting with web APIs and search engines: dynamically extracting real-world data to enhance reasoning and analysis.
LLM-Based Approaches for Data Analysis
Enhanced capabilities:
- Enhanced data extraction
- Knowledge synthesis
- Help with data cleaning and integrating multiple datasets
- Handling data formatting issues, resolving inconsistencies, merging datasets
Extracting structured data from non-tabular formats
- LLMs can analyze text data to extract sentiments and trends
- Recent works on converting unstructured data to structured data, particularly in medical domain
- Examples: HeyMarvin for qualitative data analysis
Assisting in report generation and creativity
- Facilitating the generation of dashboards and what-if analyses with minimal developer involvement
- Transforming static elements into interactive versions
- Tailoring content and format of reports to suit various audiences and devices
- Creating artistic and aesthetically pleasing charts and infographics using GenAI models like DALL-E and Stable Diffusion
Challenges for AI systems in creating interactive designs:
- Linking various forms of media (text, images, videos, animations)
- Employing diverse interaction techniques (details-on-demand, drag-and-drop functionality, scroll-based interactions, hover effects, responsive design, gamification)
Error: Output length exceeds input length after 5 attempts.
Enhancing User-AI Interaction: Natural Intent Communication
Challenges of Relying on Natural Language Based Interfaces:
- Users face challenges expressing complex intents due to limited expressiveness
- Lengthy or difficult natural language communication required for some tasks (e.g., altering legend placement)
Potential Ways AI Systems Can Enable Natural Modes of User Communication:
- Multi-modal inputs:
- Other modalities like mouse-based operations, pen and touch, GUI based interactions, audio, and gestures provide more powerful and easier ways to communicate intent
- Examples: Data Formulator, DirectGPT, InternGPT, ONYX
- Direct manipulation:
- Users can directly manipulate visualizations or UI elements to convey their intent (sketching, inking)
- Input-output examples and programming by example for data transformations
- Audio input:
- Another representation of chat that is more powerful with demonstrations or user manipulations
- Multimodal interactions
- Involve both the user and AI system taking the initiative as appropriate
- Personalized and dynamic UIs:
- Generative capabilities of language models enable customized interfaces on-the-fly for various tasks (e.g., DynaVis, Stylette)
Facilitating Trust and Verification: Enhancing Model Output Reliability for Users
GenAI Models:
- Can produce unintended outputs due to model hallucinations or ambiguous user input
- Require users to verify the accuracy of their outputs
- The verification process should not impose a significant cognitive load on users
Co-Audit Tools:
- Aim to assist users in checking AI-generated content
- Provide code explanations: natural language, or step-by-step natural language utterances
- Generate multiple possible charts/data for the same user input to disambiguate
- Inspect and verify calculated columns without viewing underlying code (e.g., ColDeco)
- Show detailed visual transformation process with interactive widgets to support error adjustments
Multi-Modal and Interactive Outputs:
- Format of output significantly influences user's understanding and verification ability
- A multimodal document interlaced with text and images is more readable than lengthy code or text
- Interactive charts and what-if analyses allow users to quickly experiment and verify outcomes
Debugging Support for Users:
- AI systems consisting of multiple models, agents, and components can be challenging for end users to debug
- Providing real-time visibility into the actions of these agents, along with support for checkpointing and interruptibility, would allow users to provide feedback and resume
- Tools like AutoGen Studio orchestrate multiple agents, providing a no-code interface for authoring, debugging, and deploying multi-agent workflows
Streamlining Data Analysis Tools: Unified Analysis Experience
Current Data Analysis Workflow:
- Multiple tools like Excel, PowerBI, Jupyter Notebooks, SQL servers are used for different aspects of data analysis
- Overhead in switching contexts, transferring data, and maintaining intent across platforms
Unifying Capabilities:
- Unifying multiple capabilities into a single tool can streamline the analysis process
- Enables unified AI system to seamlessly debug and reason across steps of the process
- Examples: Data Formulator, which combines data transformation with visualization authoring; LIDA, which integrates data summarization, goal exploration, chart authoring, and infographics generation
Multi-agent Systems:
- Allow creating AI systems composed of multiple agents, each specialized in a particular capability
- Enables the system to plan and collaborate on complex tasks
- Promising in producing complex AI tools in various domains like scientific reasoning and software engineering
Blended Apps:
- Modern applications are starting to create all-in-one workbenches
- Allows for automatic transitions between different AI/non-AI based tools
- Enhances user experience by integrating the context of existing non-AI systems
OS Agents:
- Automatically control and interact with multiple applications on the desktop
- Perform tasks across multiple applications based on natural language commands from the user
- Promising for enabling a unified analysis experience, especially for applications that cannot be controlled via APIs or co-pilots.
Challenges in Developing AI-Powered Data Analysis Systems
- Design Considerations: Enhancing human-AI experiences in data analysis tools (Figure 6)
- Research Challenges: Developing effective AI systems incorporating these principles
- Challenges:
- Multimodality, Planning, and Personalization: New models needed to support these capabilities
- User Preferences: Understanding users' preferences in data analysis
- Data Infrastructure: Creating infrastructure for AI-driven suggestions
- Model Reliability: Enhancing the reliability of existing AI models
- System Benchmarking and Evaluation Metrics: Establishing robust metrics to measure system performance
- Contextualization: Problems faced in data analysis domain (not just LLM-based interactive systems)
Improving Reliability of AI-Based Data Analysis Systems
Potential Issues:
- Hallucinations (Tonmoy et al., 2024)
- Sensitivity to prompts (Sclar et al., 2023)
- Failure to follow instructions (Liu et al., 2024b)
- Lack of acknowledgment of uncertainty (Key et al., 2022)
- Biases (Gallegos et al., 2023)
- Susceptibility to table representation in prompts (Singha et al., 2023)
Improving Reliability:
-
Ensuring output satisfies specification:
- Using correct API
- Syntax error free
-
Grounding techniques:
- RAG (Retrieval-Augmented Generation) and GPT-4 Assistants API
- Use of tools like calculator and custom functions to reduce errors
-
Verifying code accuracy:
- Input-output examples provided by user or automatically generated but verified by the user
- Self-repair and self-rank mechanisms for models to evaluate their own outputs
Handling Failure Cases:
- Provide fallback options or have model intelligently request additional information from user
- Mitigate disruptions caused by errors:
- Incorporating static UIs or dynamically composing from static UI elements
- Error identification and correction mechanisms
Ensuring Stability and Integrity of Analysis:
-
Identify hallucinations and ambiguous user intents:
- Sample multiple outputs and examine agreement to detect inconsistencies
-
Predictability, Computability, and Stability (PCS) framework:
- Evaluate trustworthiness of results by ensuring consistency when slight perturbations are applied
-
Probe model to consider alternative analyses and reason over them to determine most meaningful analysis given user intents, task, and domain
Data Analysis Benchmarking and Evaluation Metrics
Desired Evaluation Benchmarks:
- Comprehensive benchmark suite for data analysis models and systems
- Covers a broad range of data sources and tasks within the domain
- Includes low-level and high-level tasks spanning multiple steps
Existing Benchmarks:
- Focus on specific aspects of the data analysis pipeline (e.g., visualization authoring, data transformations)
- Often use different evaluation metrics and procedures
- Limited ability to test all aspects of AI systems in the domain
- Lack benchmarks for open-ended exploration and iterative reasoning
Need for New Benchmarks:
- Reveal limitations of current models in areas like multi-modal reasoning, planning, exploration
- Drive AI researchers to push boundaries in new directions
- Compile diverse data tables, sources, questions/goals, and tasks into a centralized location
- Use taxonomy of tasks categorized by complexity for humans and models
Evaluation Metric Challenges:
- Difficulty measuring correctness of generated outputs for complex artifacts like charts or UIs
- Supporting multi-modal forms of communication in evaluation
- Assessing partial correctness and orthogonal dimensions of performance
Behavioral Evaluation:
- Importance of evaluating user trust, not just output correctness
- Considering worst-case metrics and adversarial testing
Further Advancements in Models/Agents for Multi-Modal AI Systems in Data Analysis Domain
Need for Advancements:
- To realize multi-modal, iterative, and trustworthy AI systems for data analysis domain
Inference Cost vs. Efficiency:
- GPT-4 offers powerful capabilities but comes with high inference costs
- Smaller models like Llama and Phi-3 may not consistently produce reliable outputs
- Balancing efficiency and accuracy needed through advancements in smaller language models
Training Data:
- Many latest models lack training on data analysis-specific tasks or languages (e.g., R, VBA scripts)
- Insufficient training data for UI interactions (e.g., Excel button clicks) crucial for generating personalized UIs
- Difficulty in finding high-level user intents to low-level data analysis tasks mapping
Finetuning and Multi-Agent Systems:
- Fine-tuning AI models on smaller datasets can help overcome lack of training data problem
- Understanding table/data semantics, generating domain-specific code/actions
- Finetuning multi-agent systems over entire data analysis workflow
- Scarcity of ground truth data for fine-tuning and automated ways to gather it using RLAIF (using AI feedback)
Personalization and Continual Learning:
- Enhancing user experience through learning preferences, memory storage, and adapting to users over time
- Leveraging data from other users and offering tailored recommendations
- Predicting next steps for users similar to current recommendation systems
- Self-evolving AI systems that can learn and create new APIs optimized for common queries
Multi-Modal Reasoning:
- Data analysis requires reasoning across multiple modalities: code, text, speech, gestures, images, tables
- Improvements in multi-modal models like GPT-4, Llava, Phi-3-Vision but challenges remain, such as understanding complex charts or interpreting user gestures (e.g., drawing an arrow between columns)
- Combining reasoning across multiple modalities to improve performance
Planning and Exploration:
- Data analysis not a linear process; requires planning, reasoning, and exploration
- LLM-based planning shown better results when collaborating with humans rather than functioning autonomously
- Need for further advancements in both LLM capabilities and human-AI collaboration for effective planning in data analysis tasks.
Understanding User Preferences and Abilities for AI Systems
Importance:
- Tailor experiences to meet user preferences and abilities
- Improve interaction and usability of interactive AI systems
Formative Studies:
- Inform design process: Gu et al. (2024b), McNutt et al. (2023)
- Assess system performance: Wang et al. (2023a), Vaithilingam et al. (2024)
User Interaction Preferences:
- GUI + NL approach more effective than chat-based AI assistants: Data Formulator study
- Participants preferred AI-generated widows over direct actions: DynaVis tool study
- Frequent UI changes increase cognitive load: Vaithilingam et al. (2024)
User Preferences on AI Suggestions and Outputs:
- Varying levels of statistical expertise influence reactions to AI suggestions
- Some find them helpful, others overly basic or requiring significant effort
- Users need both procedure-oriented artifacts and data artifacts for understanding analysis steps and confirming details: Gu et al. (2024b)
- Users want to understand and control context given to LLM model: McNutt et al. (2023)
- Prefer linter-like assistants that highlight inappropriate usage
Future Research Directions:
- Deduce contextual preferences based on user history, past actions, and evolving tasks
- Personalize UI elements for individual users without overwhelming them with frequent changes
- Understand user intervention interfaces in dynamic multi-application environments
- User's ability to understand/manage AI-context across multiple apps
- Trust and verification with output artifacts that might span multiple apps.
Challenges for AI Systems' Data Infrastructure:
- Ensuring availability of high-quality data tables: for various domains to provide initiative analysis suggestions
- Search engine infrastructure: necessary for crawling, indexing, and ranking data tables on the internet
- Domain experts' involvement: creating APIs, real-time updates using online data, managing enterprise and proprietary data tables effectively
- Evaluating and ranking data tables: through crowd-sourcing or other mechanisms to enhance reliability and relevance of available resources
- Data privacy, security: significant research area; anonymizing and aggregating data is essential but beyond the scope of this work.
Generative AI Tools for Data Analysis: Potential and Challenges
Introduction:
- Explores generative AI's potential in unlocking insights from data
- Outlines tasks that AI systems can assist with, some with related works and inspiration for others
Tasks that AI Systems Can Assist With:
- Data cleaning and preprocessing
- Feature engineering and selection
- Model training and selection
- Interpretation of results
Design Considerations:
- Optimizing user interactions
- Enhancing workflows
- Building trust and reliability
Design Enhancements for AI Systems:
- Improving robustness in data analysis scenarios
- Developing benchmarks and evaluation metrics
Model Advancements Needed:
- Continual learning
- Multi-modal reasoning
- Planning
Benchmarking Model Innovations:
- Using multi-step, multi-modal interactive data analysis scenarios
User Studies:
- Discerning preferences of data analysis users
- Optimizing AI-driven tools for usability
Conclusion:
- Bridging the gap between complex data and actionable insights