Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 8 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@

<div align="center">
<div align="left">

```diff
+ ╔╦╗╔═╗╔╦╗╔═╗ ╔═╗╔═╗╔═╗╔╗╔╔╦╗
+ ║║╠═╣ ║ ╠═╣ ╠═╣║ ╦║╣ ║║║ ║
+ ═╩╝╩ ╩ ╩ ╩ ╩ ╩ ╩╚═╝╚═╝╝╚╝ ╩
+
+ Natural Language → SQL Query Agent

[ Natural Language → SQL Query Agent ]
```

</div>
Expand Down Expand Up @@ -64,11 +64,11 @@ Generates, validates, and executes SQL queries with retry logic.
### Installation

```bash
git clone <repository-url>
git clone https://github.com/eosho/langchain_data_agent
cd langchain_data_agent
uv sync
uv sync --all-extras
cp .env.example .env
# Edit .env with your Azure OpenAI credentials
# Edit .env with your values
```

### CLI Usage
Expand Down Expand Up @@ -238,7 +238,7 @@ The platform includes built-in configuration for these databases:
| PostgreSQL | `postgres` | postgres |
| Azure SQL | `azure_sql` | tsql |
| Azure Synapse | `synapse` | tsql |
| Azure Cosmos DB | `cosmos` | tsql |
| Azure Cosmos DB | `cosmos` | cosmosdb |
| Databricks SQL | `databricks` | databricks |
| Google BigQuery | `bigquery` | bigquery |
| MySQL | `mysql` | mysql |
Expand All @@ -250,13 +250,10 @@ The platform includes built-in configuration for these databases:

```bash
# Format and lint
uv run poe format
uv run pre-commit run --all-files

# Run tests
uv run pytest

# Type check
uv run basedpyright
```

## License
Expand Down
25 changes: 11 additions & 14 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,6 @@ data_agents:
blocked_functions:
- pg_sleep
- pg_read_file
code_interpreter:
enabled: true
system_prompt: |
You are an SQL assistant...
{schema_context}
Expand All @@ -58,22 +56,21 @@ data_agents:

## Code Interpreter (Data Visualization)

Enable the code interpreter to generate charts and visualizations from query results. When enabled, the LLM can detect visualization intent (e.g., "show me a chart", "visualize", "plot") and generate matplotlib code to create charts.
The data agent can generate charts and visualizations from query results. When the LLM detects visualization intent (e.g., "show me a chart", "visualize", "plot"), it generates matplotlib code to create charts.

```yaml
code_interpreter:
enabled: true
azure_sessions_endpoint: ${AZURE_SESSIONS_POOL_ENDPOINT}
```
Visualization is **automatically enabled** - no YAML configuration needed. The executor is selected based on environment:

| Setting | Description | Default |
|---------|-------------|---------|
| `enabled` | Enable/disable visualization generation | `false` |
| `azure_sessions_endpoint` | Azure Container Apps session pool management endpoint URL | - |
| Environment | Executor | Use Case |
|-------------|----------|----------|
| `AZURE_SESSIONS_POOL_ENDPOINT` set | Azure Sessions | Production (secure, Hyper-V isolation) |
| Not set | Local executor | Development (no sandboxing) |

**Note:** Visualization requires Azure Container Apps Dynamic Sessions for secure, isolated code execution.
```bash
# Production: Set the Azure Sessions endpoint
export AZURE_SESSIONS_POOL_ENDPOINT="https://eastus.dynamicsessions.io/subscriptions/.../sessionPools/..."
```

See [VISUALIZATION.md](VISUALIZATION.md) for complete setup instructions, architecture details, and troubleshooting.
See [VISUALIZATION.md](VISUALIZATION.md) for Azure setup instructions and troubleshooting.

## SQL Validation

Expand Down
158 changes: 29 additions & 129 deletions docs/VISUALIZATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,84 +27,17 @@ Visualization requires Azure Container Apps Dynamic Sessions. This provides:

## Azure Setup

### 1. Create a Container Apps Environment
Follow the [Azure Container Apps Dynamic Sessions with LangChain tutorial](https://learn.microsoft.com/en-us/azure/container-apps/sessions-tutorial-langchain) to:

If you don't already have one:
1. Create a Container Apps session pool
2. Get the pool management endpoint
3. Assign the `Azure ContainerApps Session Executor` role to your identity

```bash
az containerapp env create \
--name aca-env \
--resource-group rg-data-agent \
--location eastus
```

### 2. Create the Session Pool

```bash
az containerapp sessionpool create \
--name session-pool-viz \
--resource-group rg-data-agent \
--container-type PythonLTS \
--max-sessions 100 \
--cooldown-period 300 \
--location eastus
```

**Parameters:**
- `--container-type PythonLTS`: Python runtime with common data science packages
- `--max-sessions`: Maximum concurrent sessions
- `--cooldown-period`: Seconds before idle session is terminated

### 3. Get the Pool Management Endpoint

```bash
az containerapp sessionpool show \
--name session-pool-viz \
--resource-group rg-data-agent \
--query "properties.poolManagementEndpoint" -o tsv
```

This returns a URL like:
Once complete, you'll have an endpoint URL like:
```
https://eastus.dynamicsessions.io/subscriptions/<sub>/resourceGroups/<rg>/sessionPools/<pool>
```

### 4. Assign the Executor Role

Grant your identity permission to execute code in the session pool:

```bash
# Get your user ID
USER_ID=$(az ad signed-in-user show --query id -o tsv)

# Get the session pool resource ID
POOL_ID=$(az containerapp sessionpool show \
--name session-pool-viz \
--resource-group rg-data-agent \
--query id -o tsv)

# Assign the role
az role assignment create \
--role "Azure ContainerApps Session Executor" \
--assignee $USER_ID \
--scope $POOL_ID
```

**Note:** For service principals or managed identities, replace `$USER_ID` with the appropriate object ID.

### 5. Install the SDK

```bash
pip install langchain-azure-dynamic-sessions
```

Or add to your `pyproject.toml`:
```toml
dependencies = [
"langchain-azure-dynamic-sessions>=0.1.0",
]
```

## Configuration

### Environment Variable
Expand All @@ -120,23 +53,16 @@ Or in `.env`:
AZURE_SESSIONS_POOL_ENDPOINT=https://eastus.dynamicsessions.io/subscriptions/.../sessionPools/...
```

### YAML Configuration
### Executor Selection

Enable visualization in your agent config:
The system automatically selects the executor based on environment:

```yaml
data_agents:
- name: "sales_agent"
# ... other config ...
code_interpreter:
enabled: true
azure_sessions_endpoint: ${AZURE_SESSIONS_POOL_ENDPOINT}
```
| `AZURE_SESSIONS_POOL_ENDPOINT` | Executor | Use Case |
|-------------------------------|----------|----------|
| Set | Azure Sessions | Production (secure, Hyper-V isolation) |
| Not set | Local Python REPL | Development (fast, no sandboxing) |

| Setting | Description | Default |
|---------|-------------|---------|
| `enabled` | Enable/disable visualization | `false` |
| `azure_sessions_endpoint` | Session pool management endpoint URL | - |
**No YAML configuration needed** - visualization is always enabled, with the executor determined by environment.

### System Prompt

Expand All @@ -159,43 +85,23 @@ system_prompt: |

## How It Works

```
┌─────────────────────────────────────────────────────────────────┐
│ User Query │
│ "Show me a bar chart of sales by region" │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SQL Generation LLM │
│ Generates SQL + sets visualization_requested: true │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Database Query │
│ Execute SQL, return result rows │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Visualization LLM │
│ Generates matplotlib code based on data + user question │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Azure Container Apps Dynamic Sessions │
│ • Code executed in Hyper-V isolated container │
│ • plt.show() output captured automatically │
│ • Image returned as base64 PNG │
└─────────────────────────────────────────┬───────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Response │
│ Text explanation + embedded chart image │
└─────────────────────────────────────────────────────────────────┘
```mermaid
sequenceDiagram
participant User
participant SQL LLM
participant Database
participant Viz LLM
participant Executor

User->>SQL LLM: "Show me a bar chart of sales by region"
SQL LLM->>SQL LLM: Generate SQL + set visualization_requested: true
SQL LLM->>Database: Execute SQL query
Database-->>SQL LLM: Result rows
SQL LLM->>Viz LLM: Data + user question
Viz LLM->>Viz LLM: Generate matplotlib code
Viz LLM->>Executor: Execute code
Executor-->>Viz LLM: PNG image (base64)
Viz LLM-->>User: Text response + chart image
```

### Execution Flow
Expand All @@ -218,9 +124,3 @@ These prompts trigger visualization:
| "Create a pie chart of transaction types" | Pie chart |
| "Graph the distribution of order values" | Histogram |
| "Compare Q1 vs Q2 performance" | Grouped bar |

## Further Reading

- [Azure Container Apps Dynamic Sessions](https://learn.microsoft.com/azure/container-apps/sessions)
- [Session Pool Management](https://learn.microsoft.com/azure/container-apps/sessions-code-interpreter)
- [LangChain Azure Dynamic Sessions](https://python.langchain.com/docs/integrations/tools/azure_dynamic_sessions)
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ dependencies = [
"psycopg2>=2.9.11",
"langchain-azure-dynamic-sessions>=0.2.0",
"matplotlib>=3.10.8",
"tabulate>=0.9.0",
]

[project.scripts]
Expand Down
63 changes: 63 additions & 0 deletions scripts/generate_diagrams.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
"""Generate flow diagrams for documentation using LangGraph visualization.

This script generates PNG images for the data agent and intent detection flows
using LangGraph's built-in visualization.

Usage:
uv run python scripts/generate_diagrams.py
"""

import os
from pathlib import Path

from dotenv import load_dotenv

load_dotenv()


def main():
"""Generate diagrams from LangGraph and save to docs folder."""
from unittest.mock import MagicMock

from langchain_openai import AzureChatOpenAI

from data_agent.config import CONFIG_DIR
from data_agent.config_loader import ConfigLoader
from data_agent.graph import DataAgentGraph

docs_dir = Path(__file__).parent.parent / "docs"
docs_dir.mkdir(exist_ok=True)

# Load a config to get a data agent graph
config = ConfigLoader.load(CONFIG_DIR / "amex.yaml")

if config.data_agents:
agent_config = config.data_agents[0]

# Create LLM
llm = AzureChatOpenAI(
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o"),
api_version="2024-08-01-preview",
temperature=0,
)

# Create a mock datasource for diagram generation (we won't execute queries)
mock_datasource = MagicMock()

# Build the graph and compile to get visualization
graph_builder = DataAgentGraph(llm, mock_datasource, agent_config)
compiled_graph = graph_builder.compile()

# Generate PNG using LangGraph's visualization
png_data = compiled_graph.get_graph().draw_mermaid_png()
output_path = docs_dir / "data_agent_graph.png"
output_path.write_bytes(png_data)
print(f"Generated: {output_path}")

print("Done!")


if __name__ == "__main__":
main()
Loading