Detect wasteful or under-provisioned Kubernetes autoscaling in on-prem environments
scale-sniff analyzes CPU utilization metrics against Pod Counts over time to identify inefficient scaling patterns in on-prem Kubernetes clusters. Pinpoint over-provisioned deployments, under-utilized resources, and misconfigured HPA policies to optimize costs and performance.
- Reduce costs by identifying over-provisioned resources
- Improve performance by detecting under-provisioned workloads
- Optimize HPA configurations with data-driven insights
- On-prem focused – designed for environments without cloud auto-scaling benefits
Contributions welcome! Please open a GitHub issue for:
- 🐛 Bug reports
- 💡 Feature requests
- ❓ Questions & discussions
- 📝 General feedback
- Discovers Services → finds corresponding Deployments via pod label selectors
- Maps App to Pods
- Service → selects Pods via spec.selector
- Pods ← owned by Deployment ← scaled by HPA
- Fetches Metrics from Prometheus:
- CPU: from cAdvisor (built into kubelet)
- Pod Count: from kube-state-metrics
- Analyzes Efficiency:
- Compares actual CPU to HPA target with
google/gemma-2-2b-itmodel - Flags over-provisioning (high pods, low CPU) or under-provisioning (high CPU, low pods)
- Compares actual CPU to HPA target with
| Component | Purpose |
|---|---|
| kube-state-metrics | Exposes kube_pod_status_ready for pod count |
| cAdvisor (default in kubelet) | Provides container_cpu_usage_seconds_total for CPU usage |
| Prometheus | Scrapes and stores metrics |
| HPA | Enables autoscaling analysis |
| HuggingFace token | Analyze step requires HuggingFace token Visit Hugging Face Token Documentation for more details. |
go build -o scale-sniff ./cmd/cli
# help
Usage: ./scale-sniff [options]
Options:
-config-path string
Configuration file path (config.yaml should be under that!) (default "./internal/config")
-duration string
Analysis window (default "24h")
-h Show help
-help
Show help
-hf-model string
HuggingFace model override (default "google/gemma-2-2b-it")
-hf-token string
HuggingFace API token
-hf-url string
HuggingFace API URL (default "https://router.huggingface.co/v1/chat/completions")
-namespace string
Kubernetes namespace (default "default")
-prometheus-base-url string
Prometheus base URL
-range-vector string
Rate range vector (default "5m")
-step string
Sample resolution (default "5m")
# Analyze a namespace
./scale-sniff analyze --namespace prod
# Custom time window
./scale-sniff analyze --namespace dev --duration 15m --step 30s
k3d (or any local Kubernetes cluster)
kubectl
go (for building the CLI)
Create local cluster (optional if using existing cluster)
k3d cluster create demo --agents 1
kubectl apply -f nginx-deployment.yaml
kubectl apply -f prometheus-rbac.yaml
kubectl apply -f prometheus-config.yaml
kubectl apply -f prometheus-deployment.yaml
kubectl apply -f ksm-rbac.yaml
kubectl apply -f kube-state-metrics.yaml
kubectl get pods
kubectl get services
# Or use validation script
chmod +x validate.sh
./validate.sh
kubectl autoscale deployment nginx-demo --cpu-percent=10 --min=1 --max=10
kubectl port-forward service/prometheus-service 9090:9090
export HUGGINGFACE_TOKEN=<your-token>
Run scale-sniff under project directory
./scale-sniff
⏳ nginx-service: fetching
⏳ prometheus-service: fetching
⏳ nginx-service: analyzing
📊 Final Report:
❓ prometheus-service: Inconclusive — No HPA configured — autoscaling not enabled
✅ nginx-service: Efficientclean up
k3d cluster delete demo