How We Reduce Cloud Costs While Optimizing Application Performance (a Four-Step Process)

Jeremy Smart

Solution Architect

Step 1: Gain an understanding of the existing applications and platform, then build a roadmap to identify redundancies and cost optimization opportunities

The first thing we do is create an application roadmap to map out your entire system architecture to understand unnecessary system redundancies or unneeded scaling configurations. We look to gain a deep understanding of the application stack and cloud usage by application. Knowing and tracking cloud costs by application is critical for the company to understand what each application uses.

Due to the many ways to execute a workload, and how teams within an organization are structured across multiple functions, we generally find a few big opportunities for cost efficiencies during this phase.

Step 2: Gain a full understanding of technical cost optimizations through monitoring, reporting, and governance

To get a full understanding of cost optimizations, we look at four areas, specifically:

Utilization (Compute, Mem, Disk, Storage). Most organizations believe they are operating at the right level of computing utilization. However, after doing an analysis, we usually uncover that most computing resources are underutilized, even though companies are paying for full utilization. In some cases, instances could be shut down as well, with no impact.

Controlling cloud costs requires the triad of monitoring, reporting, and governance to be properly aligned. Cloud monitoring is the process of using automated software to gain visibility into application, user, and file behavior. Auditing and reporting are close adjuncts of monitoring and help to identify patterns and potential security gaps in the infrastructure.

Some key strategies that can help optimize your cloud utilization costs are:

Caching storage. By running frequent requests through low latency in-memory data stores, cloud users can save significant costs on data storage. AWS ElastiCache is a popular solution that reduces latency to sub-millisecond response times.

Data compression. Reducing the size of your storage unit with a data compression algorithm can also save on the overall cost of storage.

Auto-scaling. The ability to automatically scale up or down compute resources in cases where production workloads are variable and unpredictable. Azure Autoscaling offers a service that adjusts the capacity to help users get the best performance per unit of cost.

Optimize API calls. The key to development today is to improve functionality and reduce latency. Each new service or HTTP call within an application can multiply costs. With Amazon’s new API Gateway, developers can now address this issue with more efficient, lower latency API calls.

Optimize storage. You can optimize your cloud-based storage using these characteristics:

Size: How much storage do you really need?
Data Transfer (bandwidth): How often does your data need to move from one location to another?
Retrieval Time: How quickly do you need to access your data?
Retrieval Requests: How often do you need to access your data?

Turn off unused resources and right-size. It’s not unusual for an administrator or developer to “spin up” a temporary server to perform a function and forget to turn it off when the job is done or forget about older snapshots that aren’t being used. Automation can solve some of these issues by setting defined rules or notifying users. You should turn off idle instances and right-sizing instances that are either over-provisioned or poorly matched to the workload, based on the analysis of instance performance and user needs and patterns. Also, you have the option to “auto park” resources, which is the ability to shut down a resource during non-peak hours and automate the spinning up and spinning down processes.

Alerts. Setting thresholds on your AWS and Azure billing data sources can help identify when you’re close to exceeding your budget or your average monthly cost. Daily reports can be sent individually, to the finance team, or even to the engineers that are provisioning and running services in the cloud.

Identity and Access Management. For enterprises with complex organizational structures, hundreds of workgroups, and many projects, it’s important to have full visibility of who is using what cloud resources and when. Cloud Identity and Access Management (IAM) can give IT admins a centralized view of their cloud environment along with compliance and security controls. IAM is about connecting the right users to the right resources on the cloud in the most secure fashion. IAM must serve as a baseline component of an integrated security and compliance layer across an organization’s entire cloud infrastructure.

Account consolidation. In most of the major cloud platforms, you can create “billing families,” which allows you to associate all of your accounts under one billing profile. We take a look at all of the accounts and recommend an account consolidation plan to help you achieve better rates and a full streamlined view of your billing.

Step 3: Understand your organization's technology roadmap and how it will impact cloud costs

VM resizing. One of the great benefits of Azure VMs is the ability to change the size of your VM based on the specific needs of CPU, network, or disk performance. This can translate into significant cloud cost savings.

Serverless elasticity management. Serverless enables organizations to build and run applications and services without thinking about servers. Serverless eliminates infrastructure management tasks such as server or cluster provisioning, patching, operating system maintenance, and capacity provisioning. Elasticity reflects the ability of a system to adjust to changing workloads automatically and is a hallmark of cloud computing. Serverless elasticity management, therefore, is a mind shift change away from wrangling servers and infrastructures and freeing up time to focus on developing seamless and scalable products.

Under a serverless approach, coders can just write the algorithms and allow the serverless provider to take care of data storage and computing needs.

Partnership discounts. Organizations may find that they qualify for certain benefits and discounts from their cloud providers. If you’re a large enterprise, then purchasing a large amount of cloud storage can come with cost savings. Startup credits or purchase-in-advance agreements may be another option.

Step 4: Plan for long-term cost optimization and build a cloud adoption roadmap

Oftentimes, lift-and-shift to cloud results in inefficiencies in resource usage. A re-architecture to align better with a cloud environment will often lead to cost savings. Here are four things we evaluate when building a modernization and cloud-adoption roadmap.

Infrastructure as Code (IaC). With Infrastructure as Code, you can run all the infrastructure specifications in configuration files as your single source of truth. So anytime you need to spin up a fully configured and tested environment, you can do so through an IaC configuration file.

Compare this with provisioning new environments and having to retest and reconfigure, and usually ending up with slightly different environments than you originally asked for. Because an infrastructure can be created within minutes to your specific configuration, you don’t have to spend the money on underutilized resources or hardware you haven’t been able to utilize yet.

The approach reflects a progression beyond the initial “lift-and-shift” approach to one that produces quicker agility, speed to market, and lower long term costs. The premise here is that as companies avoid manual configurations and run an environment as code in a repeatable and scalable manner, they free up significant time and save the costs of hiring and training engineers and provisioning hardware—and channel that toward innovation and development.

Serverless adoption. Since serverless is a pay-per-use/second model, we’ve been able to modernize applications using a serverless approach and significantly reduce costs.

Ingress/Egress optimization. Ingress traffic is composed of all the data communications and network traffic originating from external networks and destined for a node in the host network (egress traffic is the reverse when all traffic is directed toward an external network and originated from inside the host network).

Many cloud providers will charge you for egress traffic, and when this isn’t configured correctly or optimized, you’ll be shocked at how big the cloud bill will be. We spend a good amount of time and effort analyzing this traffic to make sure you’re getting the most out of it, but not spending an unnecessary amount of money on it.

Evaluate multi-cloud vs. single cloud. A multi-cloud strategy involves the utilization of two or more public or private cloud platforms within your cloud environment in order to reduce over-reliance on a single provider. For instance, some companies may prefer to integrate Azure, GCP, and AWS into their ecosystem, based on the benefits and customizations of each platform. Some cloud platforms might specialize in large data transfers or integrated machine learning features, whereas others, like Azure, provide more robust VM resizing options.

The idea of adopting multi-cloud cost optimization to save money might be tempting, but it’s important to weigh the costs. In some cases, a so-called multi-cloud “cost savings” could be outweighed by the administrative hassles of switching between platforms, paying for network traffic between clouds, and training staff on multiple clouds.

The cloud is the foundation of innovation and improving how you manage, measure, and reduce IT costs, will help your IT organizations move quickly, without restrictions.

There is a wide range of best practices, strategies, and techniques for reducing cloud costs. And because of the growth of cloud computing, this area will continue to become increasingly more complex to manage as the number of options to deploy a cloud-based infrastructure increases.

By addressing near-term, tactical considerations, and more strategic and structural issues, leadership can help their organizations move confidently into the new digital reality that is increasingly reliant on cloud-based services.