Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Built on top of LangChain's [`SQLDatabase`](https://docs.langchain.com/oss/pytho
- **Data Visualization**: Generate charts and graphs from query results using natural language (e.g., "show me a bar chart")
- **Configurable Agents**: YAML-based configuration for adding new data sources
- **A2A Protocol**: Agent-to-Agent interoperability for integration with other A2A-compliant systems
- **MCP Protocol**: Model Context Protocol support for Claude Desktop, VS Code, and other MCP clients

## Architecture

Expand All @@ -51,7 +52,9 @@ Generates, validates, and executes SQL queries with retry logic.
- [Database Setup](docs/DATABASE_SETUP.md)
- [Configuration](docs/CONFIGURATION.md)
- [Data Visualization](docs/VISUALIZATION.md)
- [Prompts & Dialects](docs/PROMPTS.md)
- [A2A Protocol](docs/A2A.md)
- [MCP Protocol](docs/MCP.md)

## Quick Start

Expand Down
15 changes: 1 addition & 14 deletions docs/CONFIGURATION.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,6 @@
# Configuration

Data agents are configured via YAML files. See `src/data_agent/config/contoso.yaml` for a complete example.

## Intent Detection

```yaml
intent_detection_agent:
llm:
model: gpt-4o
provider: azure_openai
temperature: 0.0
system_prompt: |
You are an intent detection assistant...
{agent_descriptions}
```
Data agents are configured via YAML files. See `src/data_agent/agents/contoso.yaml` for a complete example.

## Data Agent Definition

Expand Down
147 changes: 147 additions & 0 deletions docs/MCP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# MCP Protocol Support

The Data Agent supports the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/), enabling integration with Claude Desktop, VS Code, Cursor, and other MCP-compatible clients.

## Quick Start

```bash
# Start MCP server with SSE transport (default)
uv run data-agent mcp

# Start with a specific config
uv run data-agent mcp --config contoso

# Start with stdio transport (for Claude Desktop)
uv run data-agent mcp --transport stdio

# Start on a custom port
uv run data-agent mcp --port 9000
```

## Server Options

| Option | Default | Description |
|--------|---------|-------------|
| `--config, -c` | all | Configuration name (e.g., `contoso`). Loads all configs if not specified. |
| `--transport, -t` | sse | Transport: `sse` for HTTP clients (VS Code, Cursor), `stdio` for Claude Desktop |
| `--port, -p` | 8002 | Port for SSE transport |
| `--log-level` | warning | Logging level |

## Available Tools

The MCP server exposes the following tools:

| Tool | Description |
|------|-------------|
| `query` | Execute natural language queries against datasources |
| `list_datasources` | List all configured datasources with descriptions |
| `list_tables` | List tables for a specific datasource |
| `get_schema` | Get database schema for a specific datasource |
| `validate_sql` | Validate SQL syntax without executing |

## Available Resources

| Resource URI | Description |
|--------------|-------------|
| `datasources://list` | List of available datasources |
| `schema://{datasource}` | Database schema for a datasource |
| `tables://{datasource}` | List of tables for a datasource |

## Client Configuration

### Claude Desktop

Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):

```json
{
"mcpServers": {
"data-agent": {
"command": "uv",
"args": ["run", "data-agent-mcp"],
"cwd": "/path/to/langchain_data_agent"
}
}
}
```

### VS Code

Add to `.vscode/mcp.json` in your workspace:

```json
{
"servers": {
"data-agent": {
"type": "sse",
"url": "http://127.0.0.1:8002/sse"
}
}
}
```

> **Note:** Start the MCP server first with `uv run data-agent mcp` before connecting.

Or for stdio transport (runs server automatically):

```json
{
"servers": {
"data-agent": {
"type": "stdio",
"command": "uv",
"args": ["run", "data-agent-mcp", "--transport", "stdio"]
}
}
}
```

### Cursor / Windsurf

Similar configuration to VS Code. Check your IDE's MCP documentation.

## Example Usage

Once configured, you can interact with the Data Agent directly from your AI client:

```
User: What datasources are available?
AI: [calls list_datasources] → Shows contoso, adventure_works, amex

User: What's the schema for the contoso database?
AI: [calls get_schema("contoso")] → Shows tables, columns, types

User: Show me the top 5 products by sales in Q4 2024
AI: [calls query("top 5 products by sales Q4 2024")] → Returns results
```

## Programmatic Client Example

```python
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def main():
server_params = StdioServerParameters(
command="uv",
args=["run", "data-agent-mcp"]
)

async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()

# List available tools
tools = await session.list_tools()
print("Available tools:", [t.name for t in tools.tools])

# Execute a query
result = await session.call_tool(
"query",
arguments={"question": "What are the top selling products?"}
)
print(result.content)

import asyncio
asyncio.run(main())
```
173 changes: 173 additions & 0 deletions docs/PROMPTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Prompts Module

This module manages system prompts for the Data Agent's LLM interactions. It provides a modular, extensible prompt architecture that supports multiple database dialects and customization through configuration.

## Architecture

```
src/data_agent/prompts/
├── __init__.py # Public exports
├── builder.py # Prompt assembly logic
├── defaults.py # Default prompt templates
└── dialects.py # Database-specific SQL guidelines
```

## Components

### defaults.py - Core Prompt Templates

Contains the default system prompts used across the agent:

| Prompt | Purpose |
|--------|---------|
| `DEFAULT_INTENT_DETECTION_PROMPT` | Routes user questions to the appropriate data agent |
| `DEFAULT_GENERAL_CHAT_PROMPT` | Handles greetings and capability questions |
| `DEFAULT_SQL_PROMPT` | Guides SQL generation with schema context |
| `DEFAULT_RESPONSE_PROMPT` | Formats query results into natural language |
| `VISUALIZATION_SYSTEM_PROMPT` | Generates matplotlib visualization code |
| `COSMOS_PROMPT_ADDENDUM` | Cosmos DB-specific constraints and best practices |

### dialects.py - Database-Specific Guidelines

Provides SQL dialect guidelines that are automatically appended based on datasource type:

| Dialect | Datasource Types |
|---------|------------------|
| BigQuery | `bigquery` |
| PostgreSQL | `postgres`, `postgresql` |
| Azure SQL / SQL Server | `azure_sql`, `mssql`, `sqlserver` |
| Azure Synapse | `synapse` |
| Databricks | `databricks` |
| Cosmos DB | `cosmos`, `cosmosdb` |

Each dialect includes:
- Syntax conventions (date functions, data types, quoting)
- Aggregation function usage
- String manipulation functions
- Performance best practices

### builder.py - Prompt Assembly

The `build_prompt()` function assembles the final system prompt:

```
┌─────────────────────────────────────┐
│ Date Context (current date) │
├─────────────────────────────────────┤
│ Base Prompt (custom or default) │
│ - Schema context │
│ - Few-shot examples │
├─────────────────────────────────────┤
│ Dialect Guidelines │
│ (based on datasource type) │
├─────────────────────────────────────┤
│ Cosmos Addendum (if applicable) │
│ - Partition key constraints │
└─────────────────────────────────────┘
```

## Usage

### Basic Prompt Building

```python
from data_agent.prompts import build_prompt

# Build a prompt for PostgreSQL
prompt = build_prompt(
datasource_type="postgres",
schema_context="Tables: customers (id, name, email), orders (id, customer_id, total)",
few_shot_examples="Q: How many customers?\nA: SELECT COUNT(*) FROM customers",
)
```

### Custom Prompts via Configuration

Teams can override default prompts in their agent YAML configuration using `system_prompt` and `response_prompt`:

```yaml
data_agents:
- name: my_agent
description: E-commerce sales database
datasource:
type: postgres
# ...
system_prompt: |
You are a SQL expert for our e-commerce database.
Focus on sales metrics and customer behavior.

{schema_context}

{few_shot_examples}
response_prompt: |
Provide insights focused on business impact.
Always mention revenue implications.
table_schemas:
# ...
```

### Getting Dialect Guidelines

```python
from data_agent.prompts import get_dialect_guidelines

# Get BigQuery-specific SQL guidelines
guidelines = get_dialect_guidelines("bigquery")
```

## Prompt Template Variables

The following variables are automatically substituted:

| Variable | Description | Used In |
|----------|-------------|---------|
| `{schema_context}` | Database schema information | SQL prompt |
| `{few_shot_examples}` | Example Q&A pairs | SQL prompt |
| `{agent_descriptions}` | Available data agents | Intent detection, general chat |
| `{partition_key}` | Cosmos DB partition key | Cosmos addendum |

## Extending

### Adding a New Dialect

1. Add guidelines constant to `dialects.py`:

```python
MY_DATABASE_GUIDELINES = """## My Database SQL Guidelines

1. **Syntax conventions:**
- Use MY_DATE_FUNC() for date operations
- ...
"""
```

2. Register in `DIALECT_GUIDELINES_MAP`:

```python
DIALECT_GUIDELINES_MAP: dict[str, str] = {
# ... existing entries
"mydatabase": MY_DATABASE_GUIDELINES,
}
```

### Adding a New Prompt Type

1. Add the template to `defaults.py`:

```python
MY_NEW_PROMPT = """You are a specialized assistant for...

{custom_variable}
"""
```

2. Export in `__init__.py`:

```python
from data_agent.prompts.defaults import MY_NEW_PROMPT

__all__ = [
# ... existing exports
"MY_NEW_PROMPT",
]
```
Binary file added docs/langchain.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,14 @@ dependencies = [
"langchain-azure-dynamic-sessions>=0.2.0",
"matplotlib>=3.10.8",
"tabulate>=0.9.0",
"mcp>=1.25.0",
]

[project.scripts]
data-agent = "data_agent.cli:main"
data-agent-ui = "data_agent.ui:main"
data-agent-a2a = "data_agent.a2a.server:main"
data-agent-mcp = "data_agent.mcp.server:main"

[project.optional-dependencies]
dev = [
Expand Down
Loading