Author: DeWitt Gibson (@dewitt4)
Repository: https://github.com/Finoptimize/agentaflow-sro-community
Deploy and manage AI infrastructure more efficiently with tools for GPU orchestration, model serving optimization, and comprehensive observability.
Security by Trivy
Click the image above to watch a complete demo of AgentaFlow's GPU monitoring, web dashboard, and Prometheus integration
Tools that optimize GPU utilization across workloads, reducing waste:
- Smart Scheduling: Multiple strategies (least-utilized, best-fit, priority, round-robin)
- Kubernetes Integration: Native Kubernetes GPU scheduling with Custom Resource Definitions
- Resource Optimization: Reduce GPU idle time by up to 40%
- Workload Management: Efficient queuing and distribution across GPU clusters
- Real-time Monitoring: Track utilization, memory, temperature, and power
Software that reduces inference costs through better batching, caching, and routing:
- Request Batching: Improve throughput by 3-5x with intelligent batching
- Smart Caching: Reduce latency by up to 50% with TTL-based caching
- Load Balancing: Multiple routing strategies for optimal distribution
- Cost Reduction: Minimize inference costs through efficient resource use
Enterprise-grade monitoring, debugging, and cost tracking for LLM applications and training runs:
- Prometheus Integration: Production-ready metrics export with 20+ GPU and cost metrics
- Grafana Dashboards: Pre-built visual analytics for GPU clusters and cost optimization
- Real-time Alerting: Automatic threshold monitoring and notification system
- Cost Tracking: Detailed tracking of GPU hours, tokens, and operational costs with live dashboards
- Comprehensive Metrics: Counters, gauges, and histograms for all operations
- Distributed Tracing: Full request tracing across distributed systems
- Debug Utilities: Multi-level logging with performance analysis
Our production-ready web dashboard provides real-time GPU monitoring with a modern, professional interface:
Real-time GPU monitoring dashboard with live metrics, charts, and system overview
Interactive Chart.js visualizations show GPU performance trends and cost analytics:
GPU utilization and temperature tracking with live cost breakdown analytics
Comprehensive GPU monitoring with individual card status and real-time alerts:
Individual GPU monitoring cards showing utilization, temperature, memory usage, and health status
Real-time alert system with WebSocket notifications and threshold monitoring:
Live alert feed with temperature warnings, utilization alerts, and memory notifications
Advanced analytics showing efficiency scores, cost tracking, and performance insights:
System-wide metrics including efficiency scoring, cost per hour, and resource optimization
Demo Ready: All screenshots show the dashboard running on a local laptop without requiring NVIDIA hardware - perfect for demonstrations and development!
# Run web dashboard with Docker
docker run -p 9000:9000 -p 9001:9001 ghcr.io/finoptimize/agentaflow-sro-community:web-dashboard
# Or use Docker Compose for complete stack (Dashboard + Prometheus + Grafana)
curl -O https://raw.githubusercontent.com/Finoptimize/agentaflow-sro-community/main/docker-compose.yml
docker-compose up -d
# Access at:
# - Dashboard: http://localhost:9000
# - Grafana: http://localhost:3000 (admin/agentaflow123)
# - Prometheus: http://localhost:9090go get github.com/Finoptimize/agentaflow-sro-community# Single command - web dashboard with real-time GPU monitoring
docker run -p 9000:9000 -p 9001:9001 agentaflow-sro:web-dashboard
# Or complete monitoring stack
docker-compose up -dRun the comprehensive demo:
cd cmd/agentaflow
go run main.goThis demonstrates all three core components working together.
# Build web dashboard
docker build -f docker/Dockerfile.web-dashboard -t agentaflow-sro:web-dashboard .
# Build all images
./docker/build.ps1 # Windows
./docker/build.sh # Linux/Mac
# See docker/README.md for complete documentationimport "github.com/Finoptimize/agentaflow-sro-community/pkg/gpu"
scheduler := gpu.NewScheduler(gpu.StrategyLeastUtilized)
// Register GPU
gpu1 := &gpu.GPU{
ID: "gpu-0",
Name: "NVIDIA A100",
MemoryTotal: 40960,
Available: true,
}
scheduler.RegisterGPU(gpu1)
// Submit and schedule workload
workload := &gpu.Workload{
ID: "training-job-1",
MemoryRequired: 32768,
Priority: 1,
}
scheduler.SubmitWorkload(workload)
scheduler.Schedule()import "github.com/Finoptimize/agentaflow-sro-community/pkg/serving"
servingMgr := serving.NewServingManager(&serving.BatchConfig{
MaxBatchSize: 32,
MaxWaitTime: 100 * time.Millisecond,
}, 5*time.Minute)
// Process inference with automatic caching
response, _ := servingMgr.SubmitInferenceRequest(&serving.InferenceRequest{
ModelID: "gpt-model",
Input: []byte("Your prompt"),
})import "github.com/Finoptimize/agentaflow-sro-community/pkg/observability"
monitor := observability.NewMonitoringService(10000)
// Track costs
monitor.RecordCost(observability.CostEntry{
Operation: "inference",
GPUHours: 0.5,
TokensUsed: 1000,
Cost: 2.50,
})
// Get cost summary
summary := monitor.GetCostSummary(startTime, endTime)import "github.com/Finoptimize/agentaflow-sro-community/pkg/gpu"
import "github.com/Finoptimize/agentaflow-sro-community/pkg/observability"
// Create GPU metrics collector (collects every 5 seconds)
metricsCollector := gpu.NewMetricsCollector(5 * time.Second)
// Create monitoring service integration
monitoringService := observability.NewMonitoringService(10000)
integration := observability.NewGPUMetricsIntegration(monitoringService, metricsCollector)
// Start real-time collection
metricsCollector.Start()
// Register callback for real-time monitoring
metricsCollector.RegisterCallback(func(metrics gpu.GPUMetrics) {
fmt.Printf("GPU %s: %.1f%% util, %.1f°C, %dMB used\n",
metrics.GPUID, metrics.UtilizationGPU, metrics.Temperature, metrics.MemoryUsed)
})
// Get system overview
overview := metricsCollector.GetSystemOverview()
fmt.Printf("Total GPUs: %v, Active: %v, Avg Util: %.1f%%\n",
overview["total_gpus"], overview["active_gpus"], overview["avg_utilization"])
// Get efficiency metrics
efficiency := metricsCollector.GetGPUEfficiencyMetrics("gpu-0", time.Hour)
fmt.Printf("GPU efficiency: %.1f%% idle time, %.3f power efficiency\n",
efficiency["idle_time_percent"], efficiency["avg_power_efficiency"])import "github.com/Finoptimize/agentaflow-sro-community/pkg/observability"
// Create Prometheus exporter
prometheusConfig := observability.PrometheusConfig{
MetricsPrefix: "agentaflow",
EnabledMetrics: map[string]bool{
"gpu_metrics": true,
"scheduling_metrics": true,
"serving_metrics": true,
"cost_metrics": true,
"system_metrics": true,
},
}
exporter := observability.NewPrometheusExporter(monitoringService, prometheusConfig)
// Register GPU metrics for export
exporter.RegisterGPUMetrics()
exporter.RegisterCostMetrics()
exporter.RegisterSchedulingMetrics()
// Start metrics server for Prometheus scraping
go exporter.StartMetricsServer(8080)
// Enable GPU integration with Prometheus export
integration.SetPrometheusExporter(exporter)
integration.EnablePrometheusExport(true)
// Metrics available at http://localhost:8080/metrics
// - agentaflow_gpu_utilization_percent
// - agentaflow_gpu_temperature_celsius
// - agentaflow_gpu_memory_used_bytes
// - agentaflow_cost_total_dollars
// - agentaflow_workloads_pending// Create metrics aggregation service
aggregationService := gpu.NewMetricsAggregationService(
metricsCollector,
1*time.Minute, // Aggregation interval
24*time.Hour, // Retention period
)
aggregationService.Start()
// Get comprehensive GPU statistics
stats, _ := aggregationService.GetGPUStats("gpu-0")
fmt.Printf("Average utilization: %.1f%%, Peak: %.1f%%\n",
stats.AverageUtilization, stats.PeakUtilization)
// Get efficiency report
report := aggregationService.GetEfficiencyReport()
clusterEff := report["cluster_efficiency"].(map[string]interface{})
fmt.Printf("Cluster idle time: %.1f%%, Efficiency potential: %.1f%%\n",
clusterEff["average_idle_time_percent"], clusterEff["utilization_potential"])
// Analyze performance trends
trends := aggregationService.GetPerformanceTrends("gpu-0", 4*time.Hour)
utilTrend := trends["utilization_trend"].(map[string]float64)
fmt.Printf("Utilization trend: slope=%.3f (r²=%.3f)\n",
utilTrend["slope"], utilTrend["r_squared"])
// Get cost analysis
costAnalysis := aggregationService.GetCostAnalysis()
fmt.Printf("Estimated cost: $%.2f, Potential savings: $%.2f (%.1f%%)\n",
costAnalysis["total_estimated_cost"], costAnalysis["total_potential_savings"],
costAnalysis["savings_percentage"])# Deploy the Kubernetes GPU scheduler
kubectl apply -f examples/k8s/scheduler-deployment.yaml
# Submit a GPU workload
./k8s-gpu-scheduler --mode=cli submit examples/k8s/pytorch-training-workload.yaml
# Monitor GPU resources across the cluster
./k8s-gpu-scheduler --mode=cli status
# Watch real-time status updates
./k8s-gpu-scheduler --mode=cli watch
# Check GPU health across all nodes
./k8s-gpu-scheduler --mode=cli healthimport "github.com/Finoptimize/agentaflow-sro-community/pkg/k8s"
// Create Kubernetes GPU scheduler
scheduler, _ := k8s.NewKubernetesGPUScheduler("agentaflow", gpu.StrategyLeastUtilized)
// Start the scheduler
ctx := context.Background()
scheduler.Start(ctx)
// Submit GPU workload
workload := &k8s.GPUWorkload{
ObjectMeta: metav1.ObjectMeta{Name: "training-job"},
Spec: k8s.GPUWorkloadSpec{
Priority: 5,
GPUMemoryRequired: 8192, // 8GB
GPURequirements: k8s.GPURequirements{
GPUCount: 1,
ExclusiveAccess: true,
},
},
}
scheduler.SubmitGPUWorkload(workload)| Component | Benefit | Impact |
|---|---|---|
| GPU Scheduling | Optimized utilization | Up to 40% reduction in GPU idle time |
| Real-time Metrics | Live GPU monitoring | Real-time utilization, temperature, power tracking |
| Prometheus Integration | Enterprise monitoring | Production-ready metrics export and alerting |
| Grafana Dashboards | Visual analytics | Pre-built dashboards for GPU clusters and cost tracking |
| GPU Analytics | Performance insights | Efficiency scoring, trend analysis, cost optimization |
| Kubernetes Integration | Native K8s scheduling | Seamless integration with existing clusters |
| Request Batching | Improved throughput | 3-5x increase in requests/second |
| Response Caching | Reduced latency | Up to 50% faster responses |
| Cost Tracking | Better budgeting | Full visibility into AI infrastructure costs |
agentaflow-sro-community/
├── pkg/
│ ├── gpu/ # GPU orchestration and scheduling
│ ├── k8s/ # Kubernetes GPU scheduling integration
│ ├── serving/ # Model serving optimization
│ └── observability/ # Monitoring and debugging
├── cmd/
│ ├── agentaflow/ # Main CLI application
│ └── k8s-gpu-scheduler/ # Kubernetes GPU scheduler
├── docker/
│ ├── Dockerfile.web-dashboard # Web dashboard container (15-20MB)
│ ├── Dockerfile.k8s-scheduler # Kubernetes scheduler container
│ ├── Dockerfile.prometheus-demo # Prometheus demo container
│ └── README.md # Docker documentation
├── docker-compose.yml # Complete stack orchestration
├── monitoring/
│ ├── prometheus.yml # Prometheus configuration
│ └── prometheus/rules/ # Alert rules
└── examples/
├── k8s/ # Kubernetes deployment examples
├── monitoring/ # Grafana dashboards and configs
└── demo/ # Demo applications
AgentaFlow is fully containerized for instant deployment:
All images are security-hardened with:
- Distroless base (no shell, minimal attack surface)
- 15-20MB size (98% smaller than typical Go images)
- Non-root execution (UID 65532)
- Built-in health checks
- Multi-architecture support (AMD64 + ARM64)
| Image | Size | Purpose | Ports |
|---|---|---|---|
web-dashboard |
~20MB | Real-time GPU monitoring UI | 9000, 9001 |
k8s-scheduler |
~20MB | Kubernetes GPU scheduler | 8080 |
prometheus-demo |
~20MB | Metrics integration demo | 8080 |
The complete monitoring stack includes:
- AgentaFlow Web Dashboard
- Prometheus (metrics collection)
- Grafana (visualization)
- Pre-configured dashboards and alerts
For detailed Docker documentation, see docker/README.md and CONTAINER.md
AgentaFlow provides enterprise-grade monitoring through comprehensive Prometheus/Grafana integration with production-ready dashboards and alerting.
Run the complete Prometheus/Grafana integration demo:
cd examples/demo/prometheus-grafana
go run main.goThis starts:
- Prometheus metrics server on http://localhost:8080/metrics
- Real-time GPU monitoring with automatic export
- Cost tracking with live calculations
- Performance analytics and efficiency scoring
Deploy the full monitoring stack to Kubernetes:
# Deploy Prometheus and Grafana
kubectl apply -f examples/k8s/monitoring/prometheus.yaml
kubectl apply -f examples/k8s/monitoring/grafana.yaml
# Access Grafana dashboard (admin/agentaflow123)
kubectl port-forward svc/grafana-service 3000:3000 -n agentaflow-monitoring
# View Prometheus metrics
kubectl port-forward svc/prometheus-service 9090:9090 -n agentaflow-monitoringGPU Performance Metrics:
agentaflow_gpu_utilization_percent- Real-time GPU utilizationagentaflow_gpu_memory_used_bytes- Memory consumption trackingagentaflow_gpu_temperature_celsius- Thermal monitoringagentaflow_gpu_power_draw_watts- Power consumption trackingagentaflow_gpu_fan_speed_percent- Cooling system status
Cost & Efficiency Analytics:
agentaflow_cost_total_dollars- Real-time cost trackingagentaflow_gpu_efficiency_score- Efficiency scoring (0-100)agentaflow_gpu_idle_time_percent- Resource waste trackingagentaflow_cost_per_hour- Live hourly cost calculation
Workload & Scheduling Metrics:
agentaflow_workloads_pending- Job queue depthagentaflow_workloads_completed_total- Completion trackingagentaflow_scheduler_decisions_total- Scheduling decisionsagentaflow_gpu_assignments_total- Resource assignments
System Health & Alerts:
- Component status monitoring
- Automatic threshold alerts
- Performance trend analysis
- Resource utilization forecasting
The integration includes production-ready dashboards:
- GPU Cluster Overview - Multi-node GPU monitoring
- Cost Analysis Dashboard - Real-time cost tracking and forecasting
- Performance Analytics - Efficiency scoring and optimization insights
- Alert Management - Threshold monitoring and notifications
For complete setup guide and advanced configuration, see examples/demo/PROMETHEUS_GRAFANA_DEMO.md
AgentaFlow now includes a production-ready web dashboard for real-time GPU monitoring and system analytics.
cd examples/demo/web-dashboard
go run main.goAccess the dashboard: http://localhost:8090
- 📊 Real-time Monitoring: Live GPU metrics with WebSocket updates
- 📈 Interactive Charts: GPU utilization, temperature, and cost analytics
- 🎯 System Overview: Total GPUs, efficiency scoring, and cost tracking
- 🚨 Alert Management: Real-time notifications and one-click resolution
- 📱 Responsive Design: Optimized for desktop, tablet, and mobile
- 🔌 API Integration: REST endpoints for custom integrations
- Data Center Operations - Real-time cluster monitoring
- Cost Management - Live cost tracking and optimization
- Performance Analysis - Identify bottlenecks and inefficiencies
- Alert Management - Proactive issue detection and resolution
For detailed dashboard documentation, see examples/demo/web-dashboard/README.md
For detailed documentation, see DOCUMENTATION.md
Topics covered:
- Detailed API reference
- Scheduling strategies
- Performance optimization
- Configuration options
- Use cases and examples
- ML Training Clusters - Optimize GPU allocation across multiple training jobs
- Kubernetes GPU Workloads - Native Kubernetes scheduling for AI/ML workloads
- LLM Inference Services - Reduce costs with intelligent batching and caching
- Multi-Model Deployments - Load balance requests across model instances
- Cost Optimization - Track and minimize AI infrastructure spending
- Performance Debugging - Identify and resolve bottlenecks
- Docker Desktop or Docker Engine 20.10+
- Docker Compose 2.0+ (for full stack)
- 2GB RAM minimum, 4GB recommended
- No Go installation required!
- Go 1.21 or higher
- Kubernetes 1.20+ (for Kubernetes GPU scheduling features)
- NVIDIA GPU drivers and nvidia-docker (optional, for real GPU monitoring)
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Contributions are welcome! This is a community edition focused on providing accessible AI infrastructure optimization tools.
- ✅ Kubernetes integration for GPU scheduling
- ✅ Real-time GPU metrics collection
- ✅ Prometheus/Grafana integration - Complete monitoring stack with dashboards
- ✅ Production-ready observability - Enterprise-grade metrics export and visualization
- ✅ Web dashboard for monitoring - Interactive real-time web interface with charts and alerts
- ✅ OpenTelemetry distributed tracing - Complete tracing integration with Jaeger/OTLP support
- ✅ Docker containerization - Production-ready containers with Docker Compose orchestration
- ✅ CI/CD with GitHub Actions - Automated builds and publishing to GitHub Packages (In Progress)
- 📋 Multi-architecture builds - AMD64 + ARM64 container support (Planned)
- 📋 Helm charts - Kubernetes deployment templates (Planned)
Looking for advanced features for production environments? Our Enterprise Edition will include:
- Multi-cluster Orchestration: Manage GPU resources across multiple Kubernetes clusters
- Multi-cloud GPU resource support: Support for running in Azure, Google Cloud, Vercel, DigitalOcean, or other clouds
- Hosted MCP Server: Intergrate AgentaFlow SRO directly into your AI models
- Self-optimizing AI Agents: AI Agents learn your workflows for personalized optimization
- Advanced Scheduling Algorithms: Cost optimization algorithms and priority queues for enterprise workloads
- RBAC and Audit Logs: Role-based access control and comprehensive audit logging
- Enterprise Integrations: Slack alerts, DataDog monitoring, and other enterprise tools
- SLA Support: Guaranteed service levels with dedicated support
- Usage-based Billing Features: Advanced cost tracking and billing automation
- Advanced Dashboard: Enhanced observability for LLM models
- Data Center Edition: Run on bare metal in the data center
Contact us for early access and enterprise pricing.
For questions, issues, or contributions, please open an issue on GitHub.
Built with ❤️ by FinOptimize for AgentaFlow
