Skip to content

Commit 8cd060f

Browse files
authored
Merge pull request #785 from MicrosoftDocs/main
01/30/2025 PM Publishing
2 parents 3b5584c + a3c2e28 commit 8cd060f

File tree

6 files changed

+167
-303
lines changed

6 files changed

+167
-303
lines changed

articles/aks/TOC.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,12 @@
496496
href: /azure/kubernetes-fleet/resource-propagation?toc=/azure/aks/toc.json&bc=/azure/aks/breadcrumb/toc.json
497497
- name: Deploy SpinKube to run serverless Wasm workloads
498498
href: deploy-spinkube.md
499+
- name: Cost optimization
500+
items:
501+
- name: Understand AKS usage and costs
502+
href: understand-aks-costs.md
503+
- name: Enable cost analysis on your cluster
504+
href: cost-analysis.md
499505
- name: Cluster management
500506
items:
501507
- name: Azure portal Kubernetes resource view
@@ -821,6 +827,8 @@
821827
items:
822828
- name: Enable cost analysis on your cluster
823829
href: cost-analysis.md
830+
- name: Monitor and optimize idle costs
831+
href: cost-analysis-idle-costs.md
824832
- name: Network Observability
825833
items:
826834
- name: Container Network Observability
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
title: Monitor Azure Kubernetes Service (AKS) idle costs
3+
description: Learn how to monitor and optimize idle costs in the Azure Kubernetes Service (AKS) cost analysis add-on.
4+
author: schaffererin
5+
ms.author: schaffererin
6+
ms.service: azure-kubernetes-service
7+
ms.subservice: aks-monitoring
8+
ms.topic: how-to
9+
ms.date: 01/30/2025
10+
---
11+
12+
# Understand Azure Kubernetes Service (AKS) idle costs
13+
14+
If you have the [Azure Kubernetes Service (AKS) cost analysis add-on enabled](./cost-analysis.md#enable-cost-analysis-on-your-aks-cluster) and see high idle costs, then you have an opportunity to optimize your cluster and improve cost efficiency. This article explains what idle costs are, how to monitor them, and how to reduce them.
15+
16+
## What are idle costs?
17+
18+
*Idle costs* are due to idle resources, which come from overprovisioning, low utilization, and resource wastage scenarios. Customers often overprovision resources for various reasons, including:
19+
20+
* Concern about performance issues in the event of running out of memory resources.
21+
* Unpredictable workloads, such as a batch workload that requires a lot of resources at once and then drops for a period of time before spiking again.
22+
* Wanting to ensure that resources are available for buffer or unexpected periods of high resource demand.
23+
24+
If more nodes are provisioned than needed, or if resource requests are much greater than what's actually used, it can lead to idle costs.
25+
26+
## Monitor cluster metrics and cost data
27+
28+
To understand where your cluster idle costs come from, you can monitor the following metrics in [Azure Monitor](/azure/azure-monitor/essentials/monitor-azure-resource), [Managed Prometheus](/azure/azure-monitor/essentials/prometheus-metrics-overview#azure-monitor-managed-service-for-prometheus), or [self-hosted Prometheus](/azure/azure-monitor/essentials/prometheus-metrics-overview#azure-hosted-self-managed-prometheus):
29+
30+
* **Total memory request on the cluster** indicates the total amount of memory requested by all pods in the cluster and ensures the cluster has enough memory to meet the workload demands.
31+
* **Total CPU request on the cluster** indicates total CPU requested by all pods in the cluster and ensures the cluster can handle the computational needs of the workloads.
32+
* **Max memory usage** indicates the peak memory usage in the cluster and helps identify any memory spikes that could lead to performance issues or resource constraints. This is important for preventing OOMkill or pod crashes.
33+
* **Max CPU usage** indicates the peak CPU usage observed in the cluster, helps detect CPU-intensive workloads, and ensures the cluster can handle peak loads or spikes in workload demand.
34+
* **Average memory usage** is useful for identifying trends and determining if the cluster is overprovisioned or underutilized. If usage is consistently below the total memory capacity, it means the cluster has more memory resources than needed.
35+
* **Average CPU usage** is useful for understanding the overall CPU demand. If usage is consistently below the total CPU capacity and CPU request, it means the cluster has more CPU resources than needed.
36+
* **Node count** indicates the number of nodes in the cluster and helps identify if there's an excess number of nodes with low utilization.
37+
38+
To improve utilization, it's important to monitor these metrics and [adjust your cluster size and resources](#optimize-your-cluster-size-and-resources) accordingly.
39+
40+
## Optimize your cluster size and resources
41+
42+
| Recommended action | Description |
43+
|--------------------|-------------|
44+
| [Manual resizing](./resize-cluster.md) | Adjust the number of nodes in your cluster based on resource requirements. |
45+
| [Enable the cluster autoscaler](./cluster-autoscaler.md) | Automatically adjust the number of nodes in your cluster based on the resource requests of your pods. This helps ensure you only pay for the resources you need. |
46+
| Use an [alternative VM SKU type](./best-practices-cost.md#evaluate-sku-family) | Use a compute-optimized or memory-optimized SKU based on underutilized CPU and memory resources. |
47+
| Consider using [Spot VMs](/azure/virtual-machines/spot-vms) | For noncritical workloads, Spot VMs can be a cost-effective option. |
48+
| [Enable node autoprovisioning](./node-autoprovision.md) | Automatically provision the optimal VM configuration to run the workload in the most efficient and cost effective way. |
49+
| [Enable the Vertical Pod Autoscaler (VPA)](./use-vertical-pod-autoscaler.md) | Automatically adjust the resource requests and limits of your pods based on their actual usage. This helps ensure that your pods are using the right amount of resources and can help reduce idle costs. |
50+
51+
> [!NOTE]
52+
> If a node is ready but has no pods running and is pending cluster autoscaler scale down, most of the node cost will be considered idle. A small amount will be considered system cost.
53+
54+
## Next steps
55+
56+
For more cost optimization tips, see [Best practices for cost optimization in Azure Kubernetes Service (AKS)](./best-practices-cost.md).

articles/aks/cost-analysis.md

Lines changed: 49 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1,124 +1,111 @@
11
---
2-
title: Azure Kubernetes Service cost analysis
2+
title: Azure Kubernetes Service (AKS) cost analysis
33
description: Learn how to use cost analysis to surface granular cost allocation data for your Azure Kubernetes Service (AKS) cluster.
4-
author: nickomang
5-
ms.author: nickoman
4+
author: schaffererin
5+
ms.author: schaffererin
66
ms.service: azure-kubernetes-service
77
ms.subservice: aks-monitoring
8-
ms.custom: ignite-2023, devx-track-azurecli
98
ms.topic: how-to
109
ms.date: 06/17/2024
1110

1211
#CustomerIntent: As a cluster operator, I want to obtain cost management information, perform cost attribution, and improve my cluster footprint
1312
---
1413

15-
# Azure Kubernetes Service cost analysis
14+
# Azure Kubernetes Service (AKS) cost analysis
1615

17-
An Azure Kubernetes Service (AKS) cluster is reliant on Azure resources like virtual machines, virtual disks, load-balancers, and public IP addresses. Multiple applications can use these resources, which might be maintained by different teams within your organization. Resource consumption patterns for those applications are often variable, so their contribution towards the total cluster resource cost can also vary. Some applications can also have footprints across multiple clusters, which can pose a challenge when performing cost attribution and cost management.
16+
In this article, you learn how to enable cost analysis on Azure Kubernetes Service (AKS) to view detailed cost data for cluster resources.
1817

19-
Previously, [Microsoft Cost Management (MCM)](/azure/cost-management-billing/cost-management-billing-overview) aggregated cluster resource consumption under the cluster resource group. You could use MCM to analyze costs, but there were several challenges:
18+
## About cost analysis
2019

21-
* There was no Azure-native capability to display cluster resource usage at a level more granular than a cluster. There was no breakdown into discrete categories such as compute (including CPU cores and memory), storage, and networking.
20+
AKS clusters rely on Azure resources, such as virtual machines (VMs), virtual disks, load balancers, and public IP addresses. Multiple applications can use these resources. The resource consumption patterns often differ for each application, so their contribution toward the total cluster resource cost might also vary. Some applications might have footprints across multiple clusters, which can pose a challenge when performing cost attribution and cost management.
2221

23-
* There was no Azure-native functionality to distinguish between types of costs, for example between individual application costs and shared costs. MCM reported the cost of resources, but there was no insight into how much of the resource cost was used to run individual applications, how much was reserved for system processes required by the cluster, or what were the idle costs associated with the cluster.
22+
When you enable cost analysis on your AKS cluster, you can view detailed cost allocation scoped to Kubernetes constructs, such as clusters and namespaces, and Azure Compute, Network, and Storage resources. The add-on is built on top of [OpenCost](https://www.opencost.io/), an open-source Cloud Native Computing Foundation Incubating project for usage data collection. Usage data is reconciled with your Azure invoice data to provide a comprehensive view of your AKS cluster costs directly in the Azure portal Cost Management views.
2423

25-
* There was no Azure-native mechanism to analyze costs across multiple clusters in the same subscription scope.
24+
For more information on Microsoft Cost Management, see [Start analyzing costs in Azure](/azure/cost-management-billing/costs/quick-acm-cost-analysis).
2625

27-
As a result, you might have used third-party solutions to gather and analyze resource consumption and costs by Kubernetes-specific levels of granularity, such as by namespace or pod. Third-party solutions, however, require effort to deploy, fine-tune, and maintain for each AKS cluster. In some cases, you even need to pay for advanced features, increasing the cluster's total cost of ownership.
26+
After enabling the cost analysis add-on and allowing time for data to be collected, you can use the information in [Understand AKS usage and costs](./understand-aks-costs.md) to help you understand your data.
2827

29-
To address this challenge, AKS has integrated with MCM to offer detailed cost drill-down scoped to Kubernetes constructs, such as cluster and namespace, in addition to Azure Compute, Network, and Storage categories.
30-
31-
The AKS cost analysis addon is built on top of [OpenCost](https://www.opencost.io/), an open-source Cloud Native Computing Foundation Sandbox project for usage data collection. The cost analysis is reconciled with your Azure invoice data. Post-processed data is visible directly in the [MCM Cost Analysis portal experience](/azure/cost-management-billing/costs/quick-acm-cost-analysis).
32-
33-
## Prerequisites and limitations
34-
35-
* Your cluster must be either `Standard` or `Premium` tier, not the `Free` tier.
36-
37-
* To view cost analysis information, you must have one of the following roles on the subscription hosting the cluster: Owner, Contributor, Reader, Cost management contributor, or Cost management reader.
28+
## Prerequisites
3829

30+
* Your cluster must use the `Standard` or `Premium` tier, not the `Free` tier.
31+
* To view cost analysis information, you must have one of the following roles on the subscription hosting the cluster: `Owner`, `Contributor`, `Reader`, `Cost Management Contributor`, or `Cost Management Reader`.
32+
* [Microsoft Entra Workload ID](./workload-identity-overview.md) configured on your cluster.
33+
* If using the Azure CLI, you need version `2.61.0` or later installed.
3934
* Once you have enabled cost analysis, you can't downgrade your cluster to the `Free` tier without first disabling cost analysis.
40-
41-
* Your cluster must be deployed with a [Microsoft Entra Workload ID](./workload-identity-overview.md) configured.
42-
43-
* Kubernetes cost views are available only for the following Microsoft Azure Offer types. For more information on offer types, see [Supported Microsoft Azure offers](/azure/cost-management-billing/costs/understand-cost-mgt-data#supported-microsoft-azure-offers).
44-
* Enterprise Agreement
45-
* Microsoft Customer Agreement
46-
4735
* Access to the Azure API including Azure Resource Manager (ARM) API. For a list of fully qualified domain names (FQDNs) required, see [AKS Cost Analysis required FQDN](./outbound-rules-control-egress.md#aks-cost-analysis-add-on).
4836

49-
* Virtual nodes aren't supported at this time.
50-
51-
* AKS Automatic is not supported at this time.
52-
53-
* If using the Azure CLI, you must have version `2.61.0` or later installed.
37+
## Limitations
5438

39+
* Kubernetes cost views are only available for the *Enterprise Agreement* and *Microsoft Customer Agreement* Microsoft Azure offer types. For more information, see [Supported Microsoft Azure offers](/azure/cost-management-billing/costs/understand-cost-mgt-data#supported-microsoft-azure-offers).
40+
* Currently, virtual nodes aren't supported.
5541

5642
## Enable cost analysis on your AKS cluster
5743

5844
You can enable the cost analysis with the `--enable-cost-analysis` flag during one of the following operations:
5945

60-
* Create a `Standard` or `Premium` tier AKS cluster.
46+
* Creating a `Standard` or `Premium` tier AKS cluster.
47+
* Updating an existing `Standard` or `Premium` tier AKS cluster.
48+
* Upgrading a `Free` cluster to `Standard` or `Premium`.
49+
* Upgrading a `Standard` cluster to `Premium`.
50+
* Downgrading a `Premium` cluster to `Standard` tier.
6151

62-
* Update an AKS cluster that is already in `Standard` or `Premium` tier.
52+
### Enable cost analysis on a new cluster
6353

64-
* Upgrade a `Free` cluster to `Standard` or `Premium`.
65-
66-
* Upgrade a `Standard` cluster to `Premium`.
67-
68-
* Downgrade a `Premium` cluster to `Standard` tier.
69-
70-
The following example creates a new AKS cluster in the `Standard` tier with cost analysis enabled:
54+
Enable cost analysis on a new cluster using the [`az aks create`][az-aks-create] command with the `--enable-cost-analysis` flag. The following example creates a new AKS cluster in the `Standard` tier with cost analysis enabled:
7155

7256
```azurecli-interactive
7357
az aks create --resource-group <resource-group> --name <cluster-name> --location <location> --enable-managed-identity --generate-ssh-keys --tier standard --enable-cost-analysis
7458
```
7559

76-
The following example updates an existing AKS cluster in the `Standard` tier to enable cost analysis:
60+
### Enable cost analysis on an existing cluster
61+
62+
Enable cost analysis on an existing cluster using the [`az aks update`][az-aks-update] command with the `--enable-cost-analysis` flag. The following example updates an existing AKS cluster in the `Standard` tier to enable cost analysis:
7763

7864
```azurecli-interactive
7965
az aks update --resource-group <resource-group> --name <cluster-name> --enable-cost-analysis
8066
```
8167

68+
> [!NOTE]
69+
> An agent is deployed to the cluster when you enable the add-on. The agent consumes a small amount of CPU and Memory resources.
70+
8271
> [!WARNING]
83-
> The AKS cost analysis add-on Memory usage is dependent on the number of containers deployed. Memory consumption can be roughly approximated by 200 MB + 0.5 MB per container. The current memory limit is set to 4 GB which will support approximately 7000 containers per cluster. These estimates could be more or less depending on various factors and are subject to change.
84-
>
85-
> If you are experiencing issues such as the add-on pod getting `OOMKilled` or stuck in a `Pending` state, refer to the [AKS cost analysis add-on issues](/troubleshoot/azure/azure-kubernetes/aks-cost-analysis-add-on-issues) troubleshooting guide.
72+
> The AKS cost analysis add-on Memory usage is dependent on the number of containers deployed. You can roughly approximate Memory consumption using *200 MB + 0.5 MB per container*. The current Memory limit is set to *4 GB*, which supports approximately *7000 containers per cluster*. These estimates are subject to change.
8673
87-
## Disable cost analysis
74+
## Disable cost analysis on your AKS cluster
8875

89-
You can disable cost analysis at any time using `az aks update`.
76+
Disable cost analysis using the [`az aks update`][az-aks-update] command with the `--disable-cost-analysis` flag.
9077

9178
```azurecli-interactive
92-
az aks update --name myAKSCluster --resource-group myResourceGroup --disable-cost-analysis
79+
az aks update --name <cluster-name> --resource-group <resource-group> --disable-cost-analysis
9380
```
9481

9582
> [!NOTE]
96-
> If you intend to downgrade your cluster from the `Standard` or `Premium` tiers to the `Free` tier while cost analysis is enabled, you must first explicitly disable cost analysis.
83+
> If you want to downgrade your cluster from the `Standard` or `Premium` tier to the `Free` tier while cost analysis is enabled, you must first disable cost analysis.
9784
9885
## View the cost data
9986

100-
You can view cost allocation data in the Azure portal. To learn more about how to navigate the cost analysis UI view, see the [Cost Management documentation](/azure/cost-management-billing/costs/view-kubernetes-costs).
87+
You can view cost allocation data in the Azure portal. For more information, see [View AKS costs in Microsoft Cost Management](/azure/cost-management-billing/costs/view-kubernetes-costs).
10188

10289
### Cost definitions
10390

104-
In the Kubernetes namespaces and assets views you'll see the following charges:
91+
In the Kubernetes namespaces and assets views, you might see any of the following charges:
10592

106-
- **Idle charges**: Represents the cost of available resource capacity that wasn't used by any workloads.
107-
- **Service charges**: Represents the charges associated with the service like Uptime SLA, Microsoft Defender for Containers etc.
108-
- **System charges**: Represents the cost of capacity reserved by AKS on each node to run system processes required by the cluster, including the kubelet and container runtime. [Learn more](./concepts-clusters-workloads.md#resource-reservations).
109-
- **Unallocated charges**: Represents the cost of resources that couldn't be allocated to namespaces.
93+
* **Idle charges** represent the cost of available resource capacity that isn't used by any workloads.
94+
* **Service charges** represent the charges associated with the service, like Uptime SLA, Microsoft Defender for Containers, etc.
95+
* **System charges** represent the cost of capacity reserved by AKS on each node to run system processes required by the cluster, including the kubelet and container runtime. [Learn more](./concepts-clusters-workloads.md#resource-reservations).
96+
* **Unallocated charges** represent the cost of resources that couldn't be allocated to namespaces.
11097

11198
> [!NOTE]
112-
> It might take up to one day for data to finalize. After 24 hours, any fluctuations in costs for the previous day will have stabilized.
99+
> It might take *up to one day* for data to finalize. After 24 hours, any fluctuations in costs for the previous day will have stabilized.
113100
114101
## Troubleshooting
115102

116-
See the following guide to troubleshoot [AKS cost analysis add-on issues](/troubleshoot/azure/azure-kubernetes/aks-cost-analysis-add-on-issues).
103+
If you're experiencing issues, such as the `cost-agent` pod getting `OOMKilled` or stuck in a `Pending` state, see [Troubleshoot AKS cost analysis add-on issues](/troubleshoot/azure/azure-kubernetes/aks-cost-analysis-add-on-issues).
117104

118-
<!-- LINKS -->
119-
[az-extension-add]: /cli/azure/extension#az-extension-add
120-
[az-extension-update]: /cli/azure/extension#az-extension-update
105+
## Next steps
121106

122-
## Learn more
107+
For more information on cost in AKS, see [Understand Azure Kubernetes Service (AKS) usage and costs](./understand-aks-costs.md).
123108

124-
Visibility is one element of cost management. Refer to [Optimize Costs in Azure Kubernetes Service (AKS)](./best-practices-cost.md) for other best practices on how to gain control over your kubernetes cost.
109+
<!-- LINKS -->
110+
[az-aks-create]: /cli/azure/aks#az-aks-create
111+
[az-aks-update]: /cli/azure/aks#az-aks-update

0 commit comments

Comments
 (0)