Features • Getting Started • Quickstart • Guidance
This template provides a complete foundation for building intelligent AI agents using Microsoft Semantic Kernel SDK for Python with Azure AI Foundry integration. The solution features a two-tier architecture with a FastAPI-based backend agent and a web-based frontend, all deployed on Azure App Service.
The solution also implements the AI Gateway patterns and capabilities of Azure API Management service, including secure load-balanced backend AI models access and policies for requests, token quotas and limits.
The deployment follows a multi-resource group design with infrastructure-as-code using Azure Bicep modules, providing enterprise-grade security, monitoring, and scalability.
- Features - Detailed overview of AI agent backend, frontend, infrastructure services and security features
- Getting Started - Setup options including GitHub Codespaces, VS Code Dev Containers, and local environment
- Quickstart - Provisioning, local development, extending the solution, and cleanup
- Guidance - Region availability, quotas, dependencies, configuration, monitoring, security and performance
semantic-kernel-ai-agent/
├── src/
│ ├── agent_backend/ # AI Agent Backend Service
│ │ ├── app.py # FastAPI application entry point
│ │ ├── routes/
│ │ │ └── chat.py # Chat endpoint handler
│ │ ├── services/
│ │ │ ├── agent.py # Semantic Kernel agent initialization
│ │ │ ├── kernel.py # Kernel configuration
│ │ │ ├── conversation_store.py # Cosmos DB memory
│ │ │ └── tool_tracker.py # Plugin invocation tracking
│ │ ├── mcp_plugins/
│ │ │ ├── mcp_microsoft_learn.py # Microsoft Learn MCP plugin
│ │ │ └── mcp_weather.py # Weather MCP plugin
│ │ ├── schemas/
│ │ │ └── chat.py # Pydantic request/response models
│ │ ├── Dockerfile # Container definition
│ │ └── requirements.txt # Python dependencies
│ ├── agent_frontend/ # Web Frontend Service
│ │ ├── app.py # FastAPI web application
│ │ ├── templates/
│ │ │ └── index.html # Chat UI template
│ │ ├── static/
│ │ │ └── styles.css # UI styling
│ │ ├── Dockerfile # Container definition
│ │ └── requirements.txt # Python dependencies
│ └── docker-compose.yml # Local development composition
├── infra/ # Infrastructure as Code
│ ├── main.bicep # Main orchestration file
│ ├── main.parameters.json # Environment parameters
│ └── modules/
│ ├── ai/ # AI Foundry & model deployments
│ ├── apim/ # API Management configuration
│ │ ├── api/ # API definitions
│ │ └── policies/ # APIM policies
│ ├── app/ # App Service resources
│ ├── cosmosdb/ # Cosmos DB configuration
│ ├── keyvault/ # Key Vault configuration
│ ├── monitor/ # Monitoring & logging
│ ├── security/ # RBAC configurations
│ └── storage/ # Storage account
├── doc/ # Documentation
│ ├── mcp-servers.md # MCP server integration guide
│ ├── features.md # Detailed feature documentation
│ ├── quickstart.md # Getting started guide
│ └── ... # Additional documentation
├── azure.yaml # Azure Developer CLI configuration
└── README.md
Request Flow:
- User enters a question in the frontend chat interface
- Frontend sends POST request to backend
/chatendpoint with session ID and user input - Backend agent retrieves conversation history from Cosmos DB
- Semantic Kernel processes the input using configured plugins
- Agent may invoke MCP plugins (Microsoft Learn docs, weather data) as needed
- AI model response is generated via APIM-proxied Azure AI Foundry endpoints
- Response is persisted to Cosmos DB with tool usage tracking
- Backend returns response with answer, used tools, and token metrics
- Frontend displays the response in the chat interface
APIM acts as an intelligent load balancer for Azure OpenAI model deployments:
- Round-Robin Distribution: Requests are distributed across multiple model deployment instances
- Retry Logic: Automatic retry on transient failures (429, 503 errors)
- Backend Selection: Policy-based routing to available model endpoints
- Monitoring: Full telemetry through Application Insights
| Component | Description |
|---|---|
| APIM Load Balancing | Load balancing types, configuration options, and traffic distribution strategies for Azure OpenAI model deployments |
| APIM Load Balancing Examples | Bicep configuration examples for round-robin, weighted, and priority-based load balancing scenarios |
| APIM Policies | APIM policy definitions for managed identity authentication, rate limiting, token quotas, and security controls |
| APIM Application Insights | Application Insights integration setup for API-level logging, sampling, and monitoring configuration |
| APIM Azure Monitor | Azure Monitor integration setup for API-level logging, sampling, and monitoring configuration including LLM messages |
The agent extends its capabilities through Model Context Protocol (MCP) servers, available for the AI Agent as tools. When used, tools name and parameters are returned and displayed in the user interface after the model response.
For a complete guide to MCP server integration, available servers, and adding new plugins, check the MCP Servers Guide.
Example of a model response including the Weather MCP tool result:
You can estimate the cost of this project's architecture with Azure's pricing calculator.
Key cost components:
- Azure OpenAI/AI Foundry: Pay-per-token pricing based on model and usage
- API Management: Developer SKU for development/testing
- App Service: Linux-based plan (shared across frontend and backend)
- Cosmos DB: Request Units (RU/s) based on conversation activity
- Application Insights: Data ingestion and retention
- Key Vault: Transaction-based pricing
- Storage Account: Minimal cost for blob/table/queue storage
Cost optimization tips:
- Use GPT-4.1-mini for lower token costs when appropriate
- Limit conversation history (max_items) to reduce Cosmos DB RU consumption
- Use APIM Standard SKU for production (caching, higher limits)
- Configure Application Insights sampling in high-traffic scenarios
Microsoft Semantic Kernel:
- Semantic Kernel Documentation
- Semantic Kernel Python SDK
- Model Context Protocol (MCP)
- MCP Server Integration Guide
Azure Services:
- Azure Developer CLI (azd)
- Azure AI Foundry
- Azure OpenAI Service
- Azure API Management
- Azure App Service (Python)
- Azure Cosmos DB
- Azure Application Insights
Development Tools:


