- AWS offers a free tier of service, that allows very limited usage of resources at no cost. For example, a micro instance and small amount of storage is available for no charge. Many services are only eligible for the free tier for the first twelve months that an account exists, but other services offer a free usage tier indefinitely. (If you have an old account but starting fresh, sign up for a new one to qualify for the free tier.) AWS Activate extends this to tens of thousands of dollars of free credits to startups in certain funds or accelerators.
- You can set billing alerts to be notified of unexpected costs, such as costs exceeding the free tier. You can set these in a granular way.
- AWS offers Cost Explorer, a tool to get better visibility into costs.
- Unfortunately, the AWS console and billing tools are rarely enough to give good visibility into costs. For large accounts, the AWS billing console can time out or be too slow to use.
- Tools:
- 🔹Enable billing reports and install an open source tool to help manage or monitor AWS resource utilization. Teevity Ice (originally written by Netflix) is probably the first one you should try. Check out docker-ice for a Dockerized version that eases installation.
- 🔸One challenge with Ice is that it doesn’t cover amortized cost of reserved instances.
- Other tools include Security Monkey and Cloud Custodian.
- Use AWS Simple Monthly Calculator to get an estimate of usage charges for AWS services based on certain information you provide. Monthly charges will be based on your actual usage of AWS services, and may vary from the estimates the Calculator has provided.
- Third-party services: Several companies offer services designed to help you gain insights into expenses or lower your AWS bill, such as Cloudability, CloudHealth Technologies, and ParkMyCloud. Some of these charge a percentage of your bill, which may be expensive. See the market landscape.
- AWS’s Trusted Advisor is another service that can help with cost concerns.
- Don’t be shy about asking your account manager for guidance in reducing your bill. It’s their job to keep you happily using AWS.
- Tagging for cost visibility: As the infrastructure grows, a key part of managing costs is understanding where they lie. It’s strongly advisable to tag resources, and as complexity grows, group them effectively. If you set up billing allocation appropriately, you can then get visibility into expenses according to organization, product, individual engineer, or any other way that is helpful.
- If you need to do custom analysis of raw billing data or want to feed it to a third party cost analysis service, enable the detailed billing report feature.
- Multiple Amazon accounts can be linked for billing purposes using the Consolidated Billing feature. Large enterprises may need complex billing structures depending on ownership and approval processes.
- Multiple Amazon accounts can be managed centrally using AWS Organizations.
- For deployments that involve significant network traffic, a large fraction of AWS expenses are around data transfer. Furthermore, costs of data transfer, within AZs, within regions, between regions, and into and out of AWS and the internet vary significantly depending on deployment choices.
- Some of the most common gotchas:
- 🔸AZ-to-AZ traffic: Note EC2 traffic between AZs is effectively the same as between regions. For example, deploying a Cassandra cluster across AZs is helpful for high availability, but can hurt on network costs.
- 🔸Using public IPs when not necessary: If you use an Elastic IP or public IP address of an EC2 instance, you will incur network costs, even if it is accessed locally within the AZ.
- 🔸Managed NAT Gateway data processing: Managed NAT Gateways are used to let traffic egress from private subnets--at a cost of 4.5¢ as a data processing fee layered on top of data transfer pricing. Past a certain point, running your own NAT instances becomes far more cost effective.
- 🔸Some services do cross-AZ traffic for free: Many AWS services you'd not consider on their own merits offer a hidden value of free cross-AZ data transfer. EFS, RDS, MSK, and others are examples of this.
- This figure gives an overview:
- With EC2, there is a trade-off between engineering effort (more analysis, more tools, more complex architectures) and spend rate on AWS. If your EC2 costs are small, many of the efforts here are not worth the engineering time required to make them work. But once you know your costs will be growing in excess of an engineer’s salary, serious investment is often worthwhile.
- Larger instances aren’t necessarily priced higher in the spot market – therefore, you should look at the available options and determine which instances will be most cost effective for your jobs. See Bid Advisor.
- 🔹Spot instances:
- EC2 Spot instances are a way to get EC2 resources at significant discount — often many times cheaper than standard on-demand prices — if you’re willing to accept the possibility that they be terminated with little to no warning.
- Use Spot instances for potentially very significant discounts whenever you can use resources that may be restarted and don’t maintain long-term state.
- The huge savings that you can get with Spot come at the cost of a significant increase in complexity when provisioning and reasoning about the availability of compute capacity.
- Amazon maintains Spot prices at a market-driven fluctuating level, based on their inventory of unused capacity. Prices are typically low but can spike very high. See the price history to get a sense for this.
- You set a bid price high to indicate how high you’re willing to pay, but you only pay the going rate, not the bid rate. If the market rate exceeds the bid, your instance may be terminated.
- Prices are per instance type and per availability zone. The same instance type may have wildly different price in different zones at the same time. Different instance types can have very different prices, even for similarly powered instance types in the same zone.
- Compare prices across instance types for better deals.
- Use Spot instances whenever possible. Setting a high bid price will assure your machines stay up the vast majority of the time, at a fraction of the price of normal instances.
- Get notified up to two minutes before price-triggered shutdown by polling your Spot instances’ metadata, or by watching for the termination CloudWatch event.
- Make sure your usage profile works well for Spot before investing heavily in tools to manage a particular configuration.
- Spot fleet:
- You can realize even bigger cost reductions at the same time as improvements to fleet stability relative to regular Spot usage by using Spot fleet to bid on instances across instance types, availability zones, and (through multiple Spot Fleet Requests) regions.
- Spot fleet targets maintaining a specified (and weighted-by-instance-type) total capacity across a cluster of servers. If the Spot price of one instance type and availability zone combination rises above the weighted bid, it will rotate running instances out and bring up new ones of another type and location up in order to maintain the target capacity without going over target cluster cost.
- Spot usage best practices:
- Application profiling:
- Profile your application to figure out its runtime characteristics. That would help give an understanding of the minimum cpu, memory, disk required. Having this information is critical before you try to optimize spot costs.
- Once you know the minimum application requirements, instead of resorting to fixed instance types, you can bid across a variety of instance types (that gives you higher chances of getting a spot instance to run your application).E.g., If you know that 4 cpu cores are enough for your job, you can choose any instance type that is equal or above 4 cores and that has the least Spot price based on history. This helps you bid for instances with greater discount (less demand at that point).
- Spot price monitoring and intelligence:
- Spot Instance prices fluctuate depending on instance types, time of day, region and availability zone. The AWS CLI tools and API allow you to describe Spot price metadata given time, instance type, and region/AZ.
- Based on history of Spot instance prices, you could potentially build a myriad of algorithms that would help you to pick an instance type in a way that optimizes cost, maximizes availability, or offers predictable performance.
- You can also track the number of times an instance of certain type got taken away (out bid) and plot that in graphite to improve your algorithm based on time of day.
- Spot machine resource utilization:
- For running spiky workloads (spark, map reduce jobs) that are schedule based and where failure is non critical, Spot instances become the perfect candidates.
- The time it takes to satisfy a Spot instance could vary between 2-10 mins depending on the type of instance and availability of machines in that AZ.
- If you are running an infrastructure with hundreds of jobs of spiky nature, it is advisable to start pooling instances to optimize for cost, performance and most importantly time to acquire an instance.
- Pooling implies creating and maintaining Spot instances so that they do not get terminated after use. This promotes re-use of Spot instances across jobs. This of course comes with the overhead of lifecycle management.
- Pooling has its own set of metrics that can be tracked to optimize resource utilization, efficiency and cost.
- Typical pooling implementations give anywhere between 45-60% cost optimizations and 40% reduction in spot instance creation time.
- An excellent example of Pooling implementation described by Netflix (part1, part2)
- Application profiling:
- Spot management gotchas
- 🔸Lifetime: There is no guarantee for the lifetime of a Spot instance. It is purely based on bidding. If anyone outbids your price, the instance is taken away. Spot is not suitable for time sensitive jobs that have strong SLA. Instances will fail based on demand for Spot at that time. AWS provides a two-minute warning before Amazon EC2 must terminate your Spot instance.
- 🔹API return data: - The Spot price API returns Spot prices of varying granularity depending on the time range specified in the api call.E.g If the last 10 min worth of history is requested, the data is more fine grained. If the last 2 day worth of history is requested, the data is more coarser. Do not assume you will get all the data points. There will be skipped intervals.
- ❗Lifecycle management: Do not attempt any fancy Spot management unless absolutely necessary. If your entire usage is only a few machines and your cost is acceptable and your failure rate is lower, do not attempt to optimize. The pain for building/maintaining it is not worth just a few hundred dollar savings.
- Reserved Instances: allow you to get significant discounts on EC2 compute hours in return for a commitment to pay for instance hours of a specific instance type in a specific AWS region and availability zone for a pre-established time frame (1 or 3 years). Further discounts can be realized through “partial” or “all upfront” payment options.
- Consider using Reserved Instances when you can predict your longer-term compute needs and need a stronger guarantee of compute availability and continuity than the (typically cheaper) Spot market can provide. However be aware that if your architecture changes your computing needs may change as well so long term contracts can seem attractive but may turn out to be cumbersome.
- There are two types of Reserved Instances - Standard and Convertible. If you purchase excess Standard Reserved Instances, you may offer to “sell back” unused Reserved Instances via the Reserved Instance Marketplace, this allows you to potentially recoup the cost of unused EC2 compute instance hours by selling them to other AWS customers.
- Instance reservations are not tied to specific EC2 instances - they are applied at the billing level to eligible compute hours as they are consumed across all of the instances in an account.
- 📜There have been scattered reports of Convertible RI purchases needing to be exercised in a block-- namely, if you buy five convertible RIs in one purchase, you can't convert just two of them. Reach out to your account manager for clarification if this may impact you.
- If you have multiple AWS accounts and have configured them to roll charges up to one account using the “Consolidated Billing” feature, you can expect unused Reserved Instance hours from one account to be applied to matching (region, availability zone, instance type) compute hours from another account.
- If you have multiple AWS accounts that are linked with Consolidated Billing, plan on using reservations, and want unused reservation capacity to be able to apply to compute hours from other accounts, you’ll need to create your instances in the availability zone with the same name across accounts. Keep in mind that when you have done this, your instances may not end up in the same physical data center across accounts - Amazon shuffles availability zones names across accounts in order to equalize resource utilization.
- Make use of dynamic Auto Scaling, where possible, in order to better match your cluster size (and cost) to the current resource requirements of your service.
- If you use RHEL instances and happen to have existing RHEL on-premise Red Hat subscriptions, then you can leverage Red Hat's Cloud Access program to migrate a portion of your on-premise subscriptions to AWS, and thereby saving on AWS charges for RHEL subscriptions. You can either use your own self-created RHEL AMI's or Red Hat provided Gold Images that will be added to your private AMI's once you sign up for Red Hat Cloud Access.