Whether it’s bleeding slowly or cutting your head off, cloud expenses can easily kill your startup.
Going over the cloud budget happens even to companies with skilled engineers, CFOs and entire teams taking care of cloud costs.
Case in point? Pinterest. During one holiday season, customers spent so much time using Pinterest that the company’s cloud resource usage shot past expectations. Pinterest had paid $170 million for AWS service in advance and was then forced to buy additional capacity at a higher rate, spending $20 million more than estimated.
There’s no better place than the cloud for early tech ventures that want to scale fast. But a startup can easily go bankrupt over a single cloud bill.
Why is the cloud dangerous for startups?
Having access to resources that are almost limitless allows startups to focus on mission-critical tasks related to business growth instead of worrying about hardware maintenance. But the cloud can undoubtedly become a dangerous place for a startup.
1. Cloud costs that spiral out of control can kill a startup
In a series of missteps, a Silicon Valley startup Milkie Way burned through $72k on testing Firebase + Cloud Run in a single day. The company risked going bankrupt if Google didn’t waive the payment.
Unless you invest lots of time and effort into monitoring and forecasting costs, you’ll have no idea about how much you’ll have to pay. That is until you get your cloud bill at the end of the month and get shocked by the amount.
A managed service that automatically adjusts resources and constantly looks for potential savings can help solve this problem.
Here’s an example: Let’s say that you want to reduce your AWS bill for Kubernetes clusters.
A good solution gives you universal metrics that work with any cloud provider. This opens the doors to cloud cost optimization, because now you can forecast expenses for individual projects, clusters, namespaces, and deployments.
You can analyze your costs down to individual microservices and make informed predictions about future expenses.
Having a good grasp over your cloud expenses also increases trust from investors when raising another round.
2. Cloud outages can kill a startup
Cloud vendors seem to have unlimited resources and solid disaster prevention plans in place. But outages still happen. Over the past few months, every major cloud service provider has experienced an outage. Startups got cut off from essential services like Gmail or Slack.
The public cloud infrastructure becomes increasingly complex, outages are going to become common.
Take a look at the 10 major outages that have happened recently. You can bet that each of them lost company revenue but also reputation. Just something to think about if you’re a SaaS startup that hosts its product/service in the cloud.
10 Recent Cloud Outages that Shook the Tech Industry
- March 24–26 – Microsoft Azure suffered from a series of outages in Europe.
- April 8 – Google Cloud outage impacted Snapchat, Gmail, and Nest.
- June 9–10 – IBM Cloud users experienced downtime worldwide.
- September 28 – Microsoft 365 and Azure experienced a 5-hour global authentication service outage.
- October 7 – Microsoft Teams, Outlook, Sharepoint, OneDrive outage which was the third in less than 2 weeks.
- November 26 – Amazon Web Services outage impacted Roku, Adobe, Glassdoor, Autodesk, The Wall Street Journal, Ring, 1Password, and other services.
- December 14 – Google suffers two outages that impacted Gmail, Nest, YouTube, and other services.
- January 4 – Slack outage partially due to AWS auto-scaling failure.
- January 26 – Users across the US East Coast couldn’t access internet services including Gmail, Slack, and Zoom, outage likely caused by a cut fiber in Brooklyn.
- April 1 – Major Azure outage caused by a surge in DNS requests and a code defect sank services like Azure Services, Teams, Skype, OneDrive, and even Xbox Live.
3. Lack of skilled DevOps can kill a startup
DevOps experts are a rare breed today, and many startups struggle to get the right talent on board.
Naturally, the job of a DevOps engineer goes way beyond controlling the financial aspects of their deployments.
But if you struggle to find more DevOps resources, your DevOps experts will be forced to juggle many responsibilities, and cost optimization might stop being the priority. This may translate into risks like wasted resources or overprovisioning.
When nobody is watching, cloud costs can spin out of control. Something simple like data transfer costs can become a major expense item. In 2017, Apple paid AWS nearly $50 million in data transfer fees.
Monitoring, allocating, and optimizing cloud costs is tricky, no matter which cloud service you use.
To see where you stand, let’s play bingo! How many of these issues did you deal with?
Didn’t cross a single tile? Please, share your secret with us!
Otherwise, read on to learn what you can do to solve the cloud cost riddle.
Why is cloud management so difficult?
1. Cloud bills are hard to understand for humans
Any cost optimization effort starts with understanding your cloud bill. But how to get started when a typical cloud bill looks like this?
Cloud bills are long, complex, and hard to unpack. Each service in your bill has a defined metric for it. While some services in AWS S3 charge by the number of requests, others use GB.
How can you make sense of your cloud bill?
Start by exploring various areas in your CSP console.
If you use AWS, take a look at AWS Billing and Cost Management Dashboard. For a more granular view, go to the Cost Explorer. This is where you can group and report on costs by certain attributes – for instance, by region or service.
2. Deciphering how teams contribute to your bill is tricky
If you have multiple teams using cloud resources, you need to know how each of them contributes to your cloud bill and identify potential candidates for optimization. To do that, you need to know who is using which resources.
How to make it work?
Cloud service providers offer tools for categorizing costs by accounts, organizations, or projects to help teams keep within the spending parameters set for them.
- Organizations – Use this tool to manage and govern your environment when scaling resources. Create new AWS accounts, allocate resources, and then organize everything by grouping accounts. You can apply specific budgets and policies to accounts (or groups).
- Resource tagging – This tool allows you to tag resources directly and then break down data by tags when writing reports in the Cost Explorer. Prepare for some work here.
- Resource groups – You can create a container that consists of resources to be managed as a group. Azure recommends grouping resources that share the same lifecycle to deploy, update, and delete them together.
- Resource tagging – This tool makes it possible to apply tags to your resources (but also resource groups and subscriptions) and logically organize them. Every tag includes a name and a value pair – for instance, you can add a name Environment and a value Production to bring together all the resources in production.
Google Cloud Platform
- Projects – A project consists of a set of users, enabled APIs, billing settings, authentication, and monitoring settings for APIs. Create multiple projects and organize your cloud resources into logical groups to understand them better.
- Labels – These are tags used for billing, another really helpful feature. Note that some GCP resources don’t offer this option yet, but hopefully that gap closes soon.
3. Budgeting for the cloud is no walk in the park
Every cloud provider offers some budgeting tools that you can use to limit the resources that can be used in a project.
Setting limits and alerts is a smart move. In 2018, a development team at Adobe accidentally burned through $80,000 a day in unplanned charges for a computing job on Azure. No one discovered this mistake for over a week. As you can imagine, the bill snowballed to over half a million dollars.
How to avoid overrunning your cloud budget?
Here are the most impactful steps you can take:
- Pay attention to your budget and monitor costs regularly.
- Configure alerts and notifications.
- Discover all the expensive requirements during the process of formal discovery.
- Know your system requirements upfront and have accurate assumptions about how its features will work and scale.
- Prepare an autoscaling design for applications.
- Avoid faulty provisioning logic in IaC that goes crazy.
- When using serverless functions, always consider parallel scaling.
4. Cloud cost forecasting is hard
Cloud bills tend to fluctuate depending on usage. But forecasting still makes sense – it helps you to understand your future resource requirements and keep costs at bay.
Cloud vendors have various cloud management tools in place that help in forecasting:
- AWS Budgets – For setting custom budgets to track costs and usage, includes alerts for actual or forecasted costs that exceed your budget.
- AWS Cost Explorer – Create custom reports, analyze cost and usage data at a high level, and identify any cost drivers or anomalies.
- AWS Cost & Usage Report – This report consists of comprehensive cost and usage data with additional metadata about pricing, services, Reserved Instances, and Savings Plans at different levels of granularity.
- Cloud cost management – A dashboard that helps to track and manage costs across Azure and AWS cloud services.
- Pricing calculator – A handy tool for estimating the cost of different Azure configurations by products or ready-made scenarios like Advanced analytics on Big Data or CI/CD for Containers.
Google Cloud Platform
- Google Cloud Billing – Check how your costs are trending and how much you’re projected to spend in a given month for a specific spending group (down to one SKU in a single project).
These tools help to get the right data for these three techniques of cloud cost forecasting.
- Analyze usage reports – Gain visibility into your expenses by monitoring your resource usage reports on a regular basis. Set up alerts.
- Model cloud costs – Analyze different pricing models and plan your capacity requirements over time to project costs to create an application-level cost plan. Aggregate all of this data in one place to understand your cost trends better.
- Discover peak usage scenarios – Use periodic analytics and run reports over your usage data. Take into account other data courses like seasonal customer demand patterns. If these patterns correlate with peak resource usage, you know what to expect.
5. Reserving capacity doesn’t give you control
Many companies choose to reserve capacity in advance, lured by the promise of a lower cloud bill. But reserved capacity doesn’t give you more control over your spendings. Pinterest going over the budget by $20 million illustrates that perfectly.
Still, companies keep on committing:
- Pinterest’s IPO filing revealed that it’s committed to spending min. $750 million on AWS by July 2023.
- Lyft said that it’s planning to spend at least $300 million on AWS over the period of three years.
- Another startup that went public, Snap, committed to spending $1 billion on AWS and $2 billion on GCP over the course of five years.
Committing to a certain level of spending is tricky. Your requirements might change while your contract is still running. You might also fall victim to vendor lock-in even for a few years.
Automation is how you optimize cloud costs without hiring more DevOps engineers
As you can see, keeping track of your resources, usage, and associated costs manually is a very complicated and time-consuming process.
Since practically every startup uses the cloud in one way or another, the industry is responding.
Cost reporting solutions offer visibility, but they don’t help you take automated action. At best, they provide teams with static recommendations that need to be executed by human DevOps.