Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,10 @@ BIGQUERY_DATASET=your-dataset
BIGQUERY_LOCATION=US
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
BIGQUERY_CREDENTIALS_JSON=

# =============================================================================
# Visualization (Optional)
# Requires Azure Container Apps Dynamic Sessions for secure code execution.
# See docs/CONFIGURATION.md for setup instructions.
# =============================================================================
AZURE_SESSIONS_POOL_ENDPOINT=https://eastus.dynamicsessions.io/subscriptions/xxx/resourceGroups/xxx/sessionPools/xxx
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Built on top of LangChain's [`SQLDatabase`](https://docs.langchain.com/oss/pytho
- **Intent Detection**: Automatically routes queries to the correct data agent based on question context
- **Multi-Turn Conversations**: Follow-up questions with context awareness (e.g., "What's the average?" after a query)
- **SQL Validation**: Safe query execution with sqlglot-based validation across all dialects
- **Data Visualization**: Generate charts and graphs from query results using natural language (e.g., "show me a bar chart")
- **Configurable Agents**: YAML-based configuration for adding new data sources
- **A2A Protocol**: Agent-to-Agent interoperability for integration with other A2A-compliant systems

Expand All @@ -49,6 +50,7 @@ Generates, validates, and executes SQL queries with retry logic.

- [Database Setup](docs/DATABASE_SETUP.md)
- [Configuration](docs/CONFIGURATION.md)
- [Data Visualization](docs/VISUALIZATION.md)
- [A2A Protocol](docs/A2A.md)

## Quick Start
Expand Down Expand Up @@ -163,6 +165,7 @@ data-agent chat -c adventure_works
1. What are the total deposits by customer segment?
2. Show me all high-severity fraud alerts from the past week
3. Who are the top 5 customers by transaction volume?
4. Show me a bar chart of transactions by type

```bash
data-agent query "What are the total deposits by customer segment?" -c amex
Expand Down
21 changes: 21 additions & 0 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ data_agents:
blocked_functions:
- pg_sleep
- pg_read_file
code_interpreter:
enabled: true
system_prompt: |
You are an SQL assistant...
{schema_context}
Expand All @@ -54,6 +56,25 @@ data_agents:
answer: "There are 1,234 users."
```

## Code Interpreter (Data Visualization)

Enable the code interpreter to generate charts and visualizations from query results. When enabled, the LLM can detect visualization intent (e.g., "show me a chart", "visualize", "plot") and generate matplotlib code to create charts.

```yaml
code_interpreter:
enabled: true
azure_sessions_endpoint: ${AZURE_SESSIONS_POOL_ENDPOINT}
```

| Setting | Description | Default |
|---------|-------------|---------|
| `enabled` | Enable/disable visualization generation | `false` |
| `azure_sessions_endpoint` | Azure Container Apps session pool management endpoint URL | - |

**Note:** Visualization requires Azure Container Apps Dynamic Sessions for secure, isolated code execution.

See [VISUALIZATION.md](VISUALIZATION.md) for complete setup instructions, architecture details, and troubleshooting.

## SQL Validation

Each data agent can configure SQL validation settings to control query safety:
Expand Down
226 changes: 226 additions & 0 deletions docs/VISUALIZATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
# Data Visualization

This guide covers the code interpreter feature for generating charts and visualizations from query results.

## Overview

When enabled, the data agent can detect visualization intent in user queries (e.g., "show me a chart", "plot the data") and generate matplotlib code to create charts. The code runs in a secure, isolated environment using Azure Container Apps Dynamic Sessions.

**Key features:**
- Automatic detection of visualization requests
- LLM-generated matplotlib code
- Secure sandboxed execution with Hyper-V isolation
- Native image capture (no file storage)
- Support for bar charts, line charts, pie charts, scatter plots, and more

## Requirements

Visualization requires Azure Container Apps Dynamic Sessions. This provides:

| Feature | Benefit |
|---------|---------|
| **Hyper-V isolation** | Each execution runs in a dedicated VM |
| **Pre-installed packages** | NumPy, Pandas, Matplotlib ready to use |
| **Native image capture** | `plt.show()` output captured automatically |
| **Automatic cleanup** | Sessions terminate after idle timeout |
| **No host access** | Code cannot access host filesystem or network |

## Azure Setup

### 1. Create a Container Apps Environment

If you don't already have one:

```bash
az containerapp env create \
--name aca-env \
--resource-group rg-data-agent \
--location eastus
```

### 2. Create the Session Pool

```bash
az containerapp sessionpool create \
--name session-pool-viz \
--resource-group rg-data-agent \
--container-type PythonLTS \
--max-sessions 100 \
--cooldown-period 300 \
--location eastus
```

**Parameters:**
- `--container-type PythonLTS`: Python runtime with common data science packages
- `--max-sessions`: Maximum concurrent sessions
- `--cooldown-period`: Seconds before idle session is terminated

### 3. Get the Pool Management Endpoint

```bash
az containerapp sessionpool show \
--name session-pool-viz \
--resource-group rg-data-agent \
--query "properties.poolManagementEndpoint" -o tsv
```

This returns a URL like:
```
https://eastus.dynamicsessions.io/subscriptions/<sub>/resourceGroups/<rg>/sessionPools/<pool>
```

### 4. Assign the Executor Role

Grant your identity permission to execute code in the session pool:

```bash
# Get your user ID
USER_ID=$(az ad signed-in-user show --query id -o tsv)

# Get the session pool resource ID
POOL_ID=$(az containerapp sessionpool show \
--name session-pool-viz \
--resource-group rg-data-agent \
--query id -o tsv)

# Assign the role
az role assignment create \
--role "Azure ContainerApps Session Executor" \
--assignee $USER_ID \
--scope $POOL_ID
```

**Note:** For service principals or managed identities, replace `$USER_ID` with the appropriate object ID.

### 5. Install the SDK

```bash
pip install langchain-azure-dynamic-sessions
```

Or add to your `pyproject.toml`:
```toml
dependencies = [
"langchain-azure-dynamic-sessions>=0.1.0",
]
```

## Configuration

### Environment Variable

Set the pool endpoint:

```bash
export AZURE_SESSIONS_POOL_ENDPOINT="https://eastus.dynamicsessions.io/subscriptions/.../sessionPools/..."
```

Or in `.env`:
```bash
AZURE_SESSIONS_POOL_ENDPOINT=https://eastus.dynamicsessions.io/subscriptions/.../sessionPools/...
```

### YAML Configuration

Enable visualization in your agent config:

```yaml
data_agents:
- name: "sales_agent"
# ... other config ...
code_interpreter:
enabled: true
azure_sessions_endpoint: ${AZURE_SESSIONS_POOL_ENDPOINT}
```

| Setting | Description | Default |
|---------|-------------|---------|
| `enabled` | Enable/disable visualization | `false` |
| `azure_sessions_endpoint` | Session pool management endpoint URL | - |

### System Prompt

To enable visualization detection, include `visualization_requested` in your response format:

```yaml
system_prompt: |
You are a SQL expert for the sales database.

{schema_context}

## Response Format

Provide your response as JSON with these fields:
- "thinking": Step-by-step reasoning about the query
- "sql_query": The generated SQL query
- "explanation": Brief explanation of what the query does
- "visualization_requested": Set to true if the user asks for a chart, graph, plot, or visualization
```

## How It Works

```
┌─────────────────────────────────────────────────────────────────┐
│ User Query │
│ "Show me a bar chart of sales by region" │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SQL Generation LLM │
│ Generates SQL + sets visualization_requested: true │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Database Query │
│ Execute SQL, return result rows │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Visualization LLM │
│ Generates matplotlib code based on data + user question │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Azure Container Apps Dynamic Sessions │
│ • Code executed in Hyper-V isolated container │
│ • plt.show() output captured automatically │
│ • Image returned as base64 PNG │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Response │
│ Text explanation + embedded chart image │
└─────────────────────────────────────────────────────────────────┘
```

### Execution Flow

1. **Intent Detection**: The SQL LLM sets `visualization_requested: true` when it detects chart/graph/plot intent
2. **SQL Execution**: Query runs against the database, returning structured data
3. **Code Generation**: A second LLM call generates matplotlib code tailored to the data and question
4. **Sandboxed Execution**: Code runs in Azure Sessions with automatic image capture
5. **Response Assembly**: Text response and chart image are combined for display

## Example Queries

These prompts trigger visualization:

| Query | Chart Type |
|-------|------------|
| "Show me a bar chart of sales by region" | Bar chart |
| "Visualize the top 10 customers by revenue" | Horizontal bar |
| "Plot monthly revenue trends for 2024" | Line chart |
| "Create a pie chart of transaction types" | Pie chart |
| "Graph the distribution of order values" | Histogram |
| "Compare Q1 vs Q2 performance" | Grouped bar |

## Further Reading

- [Azure Container Apps Dynamic Sessions](https://learn.microsoft.com/azure/container-apps/sessions)
- [Session Pool Management](https://learn.microsoft.com/azure/container-apps/sessions-code-interpreter)
- [LangChain Azure Dynamic Sessions](https://python.langchain.com/docs/integrations/tools/azure_dynamic_sessions)
Binary file modified docs/data_agent_graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
16 changes: 2 additions & 14 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,11 @@ dependencies = [
"pyyaml>=6.0",
"python-dotenv>=1.2.1",
"databricks-sql-connector>=4.2.2",
"asyncpg>=0.29.0",
"psycopg[binary]>=3.1.0",
"nest-asyncio>=1.6.0",
"azure-cosmos>=4.7.0",
"aiohttp>=3.9.0",
"structlog>=24.0.0",
"typing-extensions>=4.12",
"azure-identity>=1.25.1",
"azure-keyvault-secrets>=4.10.0",
"langgraph-cli[inmem]>=0.4.10",
"langgraph-api>=0.5.42",
"pyodbc>=5.3.0",
Expand All @@ -41,14 +37,15 @@ dependencies = [
"rich>=14.0.0",
"chainlit>=2.0.0",
"pandas>=2.0.0",
"tabulate>=0.9.0",
"a2a-sdk[http-server]>=0.3.22",
"httpx>=0.27.0",
"sqlalchemy>=2.0.45",
"sqlalchemy-bigquery>=1.16.0",
"databricks-sqlalchemy>=2.0.8",
"google-cloud-bigquery-storage>=2.36.0",
"psycopg2>=2.9.11",
"langchain-azure-dynamic-sessions>=0.2.0",
"matplotlib>=3.10.8",
]

[project.scripts]
Expand All @@ -72,15 +69,6 @@ dev = [
"isort>=7.0.0",
]

# Azure AI Foundry hosting dependencies
foundry = [
"azure-ai-agentserver-langgraph>=0.1.0",
"azure-ai-projects>=1.0.0",
"azure-identity>=1.25.1",
"opentelemetry-api>=1.20.0",
"opentelemetry-sdk>=1.20.0",
]

[tool.setuptools.packages.find]
where = ["src"]
include = ["data_agent*"]
Expand Down
3 changes: 3 additions & 0 deletions src/data_agent/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -561,6 +561,9 @@ async def run(
datasource_name=result.get("datasource_name", ""),
rewritten_question=result.get("rewritten_question", ""),
messages=result.get("messages", []),
visualization_image=result.get("visualization_image"),
visualization_code=result.get("visualization_code"),
visualization_error=result.get("visualization_error"),
)

def get_agent_names(self) -> list[str]:
Expand Down
Loading