Skip to content

Sample AI Agent built on Semantic Kernel SDK for Python, using APIM as AI gateway with load balanced AI Foundry backend instances.

License

Notifications You must be signed in to change notification settings

ffilardi/semantic-kernel-ai-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Kernel AI Agent with AI Gateway

Open in GitHub Codespaces Open in Dev Containers

FeaturesGetting StartedQuickstartGuidance

Overview

This template provides a complete foundation for building intelligent AI agents using Microsoft Semantic Kernel SDK for Python with Azure AI Foundry integration. The solution features a two-tier architecture with a FastAPI-based backend agent and a web-based frontend, all deployed on Azure App Service.

The solution also implements the AI Gateway patterns and capabilities of Azure API Management service, including secure load-balanced backend AI models access and policies for requests, token quotas and limits.

The deployment follows a multi-resource group design with infrastructure-as-code using Azure Bicep modules, providing enterprise-grade security, monitoring, and scalability.

Azure Resources

Project Documentation

  • Features - Detailed overview of AI agent backend, frontend, infrastructure services and security features
  • Getting Started - Setup options including GitHub Codespaces, VS Code Dev Containers, and local environment
  • Quickstart - Provisioning, local development, extending the solution, and cleanup
  • Guidance - Region availability, quotas, dependencies, configuration, monitoring, security and performance

Project Structure

semantic-kernel-ai-agent/
├── src/
│   ├── agent_backend/                  # AI Agent Backend Service
│   │   ├── app.py                      # FastAPI application entry point
│   │   ├── routes/
│   │   │   └── chat.py                 # Chat endpoint handler
│   │   ├── services/
│   │   │   ├── agent.py                # Semantic Kernel agent initialization
│   │   │   ├── kernel.py               # Kernel configuration
│   │   │   ├── conversation_store.py   # Cosmos DB memory
│   │   │   └── tool_tracker.py         # Plugin invocation tracking
│   │   ├── mcp_plugins/
│   │   │   ├── mcp_microsoft_learn.py  # Microsoft Learn MCP plugin
│   │   │   └── mcp_weather.py          # Weather MCP plugin
│   │   ├── schemas/
│   │   │   └── chat.py                 # Pydantic request/response models
│   │   ├── Dockerfile                  # Container definition
│   │   └── requirements.txt            # Python dependencies
│   ├── agent_frontend/                 # Web Frontend Service
│   │   ├── app.py                      # FastAPI web application
│   │   ├── templates/
│   │   │   └── index.html              # Chat UI template
│   │   ├── static/
│   │   │   └── styles.css              # UI styling
│   │   ├── Dockerfile                  # Container definition
│   │   └── requirements.txt            # Python dependencies
│   └── docker-compose.yml              # Local development composition
├── infra/                              # Infrastructure as Code
│   ├── main.bicep                      # Main orchestration file
│   ├── main.parameters.json            # Environment parameters
│   └── modules/
│       ├── ai/                         # AI Foundry & model deployments
│       ├── apim/                       # API Management configuration
│       │   ├── api/                    # API definitions
│       │   └── policies/               # APIM policies
│       ├── app/                        # App Service resources
│       ├── cosmosdb/                   # Cosmos DB configuration
│       ├── keyvault/                   # Key Vault configuration
│       ├── monitor/                    # Monitoring & logging
│       ├── security/                   # RBAC configurations
│       └── storage/                    # Storage account
├── doc/                                # Documentation
│   ├── mcp-servers.md                  # MCP server integration guide
│   ├── features.md                     # Detailed feature documentation
│   ├── quickstart.md                   # Getting started guide
│   └── ...                             # Additional documentation
├── azure.yaml                          # Azure Developer CLI configuration
└── README.md

Architecture

Overview

Azure Resources

Request Flow:

  1. User enters a question in the frontend chat interface
  2. Frontend sends POST request to backend /chat endpoint with session ID and user input
  3. Backend agent retrieves conversation history from Cosmos DB
  4. Semantic Kernel processes the input using configured plugins
  5. Agent may invoke MCP plugins (Microsoft Learn docs, weather data) as needed
  6. AI model response is generated via APIM-proxied Azure AI Foundry endpoints
  7. Response is persisted to Cosmos DB with tool usage tracking
  8. Backend returns response with answer, used tools, and token metrics
  9. Frontend displays the response in the chat interface

API Management

APIM acts as an intelligent load balancer for Azure OpenAI model deployments:

  • Round-Robin Distribution: Requests are distributed across multiple model deployment instances
  • Retry Logic: Automatic retry on transient failures (429, 503 errors)
  • Backend Selection: Policy-based routing to available model endpoints
  • Monitoring: Full telemetry through Application Insights

API Management Configuration

Component Description
APIM Load Balancing Load balancing types, configuration options, and traffic distribution strategies for Azure OpenAI model deployments
APIM Load Balancing Examples Bicep configuration examples for round-robin, weighted, and priority-based load balancing scenarios
APIM Policies APIM policy definitions for managed identity authentication, rate limiting, token quotas, and security controls
APIM Application Insights Application Insights integration setup for API-level logging, sampling, and monitoring configuration
APIM Azure Monitor Azure Monitor integration setup for API-level logging, sampling, and monitoring configuration including LLM messages

MCP Integration

The agent extends its capabilities through Model Context Protocol (MCP) servers, available for the AI Agent as tools. When used, tools name and parameters are returned and displayed in the user interface after the model response.

For a complete guide to MCP server integration, available servers, and adding new plugins, check the MCP Servers Guide.

Example of a model response including the Weather MCP tool result:

Azure Resources

Costs

You can estimate the cost of this project's architecture with Azure's pricing calculator.

Key cost components:

  • Azure OpenAI/AI Foundry: Pay-per-token pricing based on model and usage
  • API Management: Developer SKU for development/testing
  • App Service: Linux-based plan (shared across frontend and backend)
  • Cosmos DB: Request Units (RU/s) based on conversation activity
  • Application Insights: Data ingestion and retention
  • Key Vault: Transaction-based pricing
  • Storage Account: Minimal cost for blob/table/queue storage

Cost optimization tips:

  • Use GPT-4.1-mini for lower token costs when appropriate
  • Limit conversation history (max_items) to reduce Cosmos DB RU consumption
  • Use APIM Standard SKU for production (caching, higher limits)
  • Configure Application Insights sampling in high-traffic scenarios

Resources

Microsoft Semantic Kernel:

Azure Services:

Development Tools:

About

Sample AI Agent built on Semantic Kernel SDK for Python, using APIM as AI gateway with load balanced AI Foundry backend instances.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published