Semantic Kernel AI Agent with AI Gateway

Features • Getting Started • Quickstart • Guidance

Overview

This template provides a complete foundation for building intelligent AI agents using Microsoft Semantic Kernel SDK for Python with Azure AI Foundry integration. The solution features a two-tier architecture with a FastAPI-based backend agent and a web-based frontend, all deployed on Azure App Service.

The solution also implements the AI Gateway patterns and capabilities of Azure API Management service, including secure load-balanced backend AI models access and policies for requests, token quotas and limits.

The deployment follows a multi-resource group design with infrastructure-as-code using Azure Bicep modules, providing enterprise-grade security, monitoring, and scalability.

Project Documentation

Features - Detailed overview of AI agent backend, frontend, infrastructure services and security features
Getting Started - Setup options including GitHub Codespaces, VS Code Dev Containers, and local environment
Quickstart - Provisioning, local development, extending the solution, and cleanup
Guidance - Region availability, quotas, dependencies, configuration, monitoring, security and performance

Project Structure

semantic-kernel-ai-agent/
├── src/
│   ├── agent_backend/                  # AI Agent Backend Service
│   │   ├── app.py                      # FastAPI application entry point
│   │   ├── routes/
│   │   │   └── chat.py                 # Chat endpoint handler
│   │   ├── services/
│   │   │   ├── agent.py                # Semantic Kernel agent initialization
│   │   │   ├── kernel.py               # Kernel configuration
│   │   │   ├── conversation_store.py   # Cosmos DB memory
│   │   │   └── tool_tracker.py         # Plugin invocation tracking
│   │   ├── mcp_plugins/
│   │   │   ├── mcp_microsoft_learn.py  # Microsoft Learn MCP plugin
│   │   │   └── mcp_weather.py          # Weather MCP plugin
│   │   ├── schemas/
│   │   │   └── chat.py                 # Pydantic request/response models
│   │   ├── Dockerfile                  # Container definition
│   │   └── requirements.txt            # Python dependencies
│   ├── agent_frontend/                 # Web Frontend Service
│   │   ├── app.py                      # FastAPI web application
│   │   ├── templates/
│   │   │   └── index.html              # Chat UI template
│   │   ├── static/
│   │   │   └── styles.css              # UI styling
│   │   ├── Dockerfile                  # Container definition
│   │   └── requirements.txt            # Python dependencies
│   └── docker-compose.yml              # Local development composition
├── infra/                              # Infrastructure as Code
│   ├── main.bicep                      # Main orchestration file
│   ├── main.parameters.json            # Environment parameters
│   └── modules/
│       ├── ai/                         # AI Foundry & model deployments
│       ├── apim/                       # API Management configuration
│       │   ├── api/                    # API definitions
│       │   └── policies/               # APIM policies
│       ├── app/                        # App Service resources
│       ├── cosmosdb/                   # Cosmos DB configuration
│       ├── keyvault/                   # Key Vault configuration
│       ├── monitor/                    # Monitoring & logging
│       ├── security/                   # RBAC configurations
│       └── storage/                    # Storage account
├── doc/                                # Documentation
│   ├── mcp-servers.md                  # MCP server integration guide
│   ├── features.md                     # Detailed feature documentation
│   ├── quickstart.md                   # Getting started guide
│   └── ...                             # Additional documentation
├── azure.yaml                          # Azure Developer CLI configuration
└── README.md

Architecture

Overview

Request Flow:

User enters a question in the frontend chat interface
Frontend sends POST request to backend /chat endpoint with session ID and user input
Backend agent retrieves conversation history from Cosmos DB
Semantic Kernel processes the input using configured plugins
Agent may invoke MCP plugins (Microsoft Learn docs, weather data) as needed
AI model response is generated via APIM-proxied Azure AI Foundry endpoints
Response is persisted to Cosmos DB with tool usage tracking
Backend returns response with answer, used tools, and token metrics
Frontend displays the response in the chat interface

API Management

APIM acts as an intelligent load balancer for Azure OpenAI model deployments:

Round-Robin Distribution: Requests are distributed across multiple model deployment instances
Retry Logic: Automatic retry on transient failures (429, 503 errors)
Backend Selection: Policy-based routing to available model endpoints
Monitoring: Full telemetry through Application Insights

API Management Configuration

Component	Description
APIM Load Balancing	Load balancing types, configuration options, and traffic distribution strategies for Azure OpenAI model deployments
APIM Load Balancing Examples	Bicep configuration examples for round-robin, weighted, and priority-based load balancing scenarios
APIM Policies	APIM policy definitions for managed identity authentication, rate limiting, token quotas, and security controls
APIM Application Insights	Application Insights integration setup for API-level logging, sampling, and monitoring configuration
APIM Azure Monitor	Azure Monitor integration setup for API-level logging, sampling, and monitoring configuration including LLM messages

MCP Integration

The agent extends its capabilities through Model Context Protocol (MCP) servers, available for the AI Agent as tools. When used, tools name and parameters are returned and displayed in the user interface after the model response.

For a complete guide to MCP server integration, available servers, and adding new plugins, check the MCP Servers Guide.

Example of a model response including the Weather MCP tool result:

Costs

You can estimate the cost of this project's architecture with Azure's pricing calculator.

Key cost components:

Azure OpenAI/AI Foundry: Pay-per-token pricing based on model and usage
API Management: Developer SKU for development/testing
App Service: Linux-based plan (shared across frontend and backend)
Cosmos DB: Request Units (RU/s) based on conversation activity
Application Insights: Data ingestion and retention
Key Vault: Transaction-based pricing
Storage Account: Minimal cost for blob/table/queue storage

Cost optimization tips:

Use GPT-4.1-mini for lower token costs when appropriate
Limit conversation history (max_items) to reduce Cosmos DB RU consumption
Use APIM Standard SKU for production (caching, higher limits)
Configure Application Insights sampling in high-traffic scenarios

Resources

Microsoft Semantic Kernel:

Azure Services:

Development Tools:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
doc		doc
infra		infra
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
azure.yaml		azure.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Kernel AI Agent with AI Gateway

Overview

Project Documentation

Project Structure

Architecture

Overview

API Management

API Management Configuration

MCP Integration

Costs

Resources

About

Uh oh!

Releases

Packages

Languages

License

ffilardi/semantic-kernel-ai-agent

Folders and files

Latest commit

History

Repository files navigation

Semantic Kernel AI Agent with AI Gateway

Overview

Project Documentation

Project Structure

Architecture

Overview

API Management

API Management Configuration

MCP Integration

Costs

Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages