How the Cloud Has Changed the High Availability and Disaster Recovery Game for CIOs

Josh Davidson

Principal and Co-Founder, Prime TSR

CIOs worry about two things:

  1. That their infrastructure doesn’t go down and is always available.
  2. When their infrastructure does go down (let’s be real), that they can recover quickly and without significant business interruption.

If any of these things happen, then there is a lot to worry about. It’s easy to say that every CIO should implement Disaster Recovery (DR) and High Availability (HA) to prevent single sources of failure and the ability to recover from a disaster at a moment’s notice.

There isn’t a CIO in the world who doesn’t want to implement Disaster Recovery and High Availability. But there are usually major problems when trying to obtain proper budgeting to implement these two things, especially in larger enterprises. 

The problems include:

  1. The cost of implementing DR and HA when they are simply “insurance policies.”
  2. The time and resources it takes to implement and actively test that they are working properly.

However, the good news is that the cloud has enabled enterprises, big and small, to implement enterprise-grade Disaster Recovery and High Availability. All without significant costs and the added security that it will work if a disaster or point of failure happens.

The cloud has enabled infrastructures to utilize multiple available zones that are physically located around the world. So, in terms of high availability, the biggest concern is avoiding architecture with a single point of failure, and the cloud has made this possible without owning the hardware and costs of running a data center.

It has also enabled many things such as full failover, hot-standby, and cold-standby during a disaster recovery. This allows companies to failover to a fully-functional environment to support mission-critical applications.

Disaster Recovery and High Availability aren’t theoretical needs anymore.

There’s a great image going around the internet asking, “Who started your digital transformation: The CIO, CTO, or COVID-19?” And of course, COVID-19 is highlighted as the right answer. With COVID-19, DR and HA are no longer theoretical needs.  

Boards and the C-suite are taking an interest in these areas, and are starting to put pressure on CIOs to incorporate them into their plans. Also, not having DR/HA in place will likely become “a resume-generating event” for CIOs, as my old boss used to say. 

If and when an event happens, it better work, and it better work great. There isn’t much slack or support for half-working solutions, or blaming leadership for not funding your needs to protect the infrastructure. You’re either ready or not.

Use this opportunity to modernize your infrastructure while reducing costs.

The financial constraints of implementing a fully-working DR/HA solution is still a concern, and the strain on cash flow is very real. 

One way to go while minimizing the costs is to leverage the cloud. Specifically, you can use Infrastructure-as-Code, enabling you to generate your DR/HA environments from the ground up when needed and avoid paying for these resources when unnecessary.

Using Infrastructure-as-Code, you can easily define multiple environments, plan for many scenarios, and be up and running with a fully-tested, production-ready environment when you need it most. This type of flexibility is unheard of in traditional environments. The flexibility and reliability of Infrastructure-as-Code is a chance to not only implement your DR/HA solution but also to modernize your infrastructure.

Infrastructure-as-Code also goes hand-in-hand with modern DevOps—something every CIO is striving to achieve these days.

Watch this AWS video to learn how to modernize your backup and disaster recovery (DR) architectures by employing hybrid models to the cloud as well as backup and DR in the cloud.

Here are some common questions we get from our clients.

What are the risks of not having a disaster recovery plan?

The obvious risk of not having a disaster recovery plan is, well, a disaster. A disaster can come in many different forms. It could be loss of internet, data loss because of a bad script, or a data center going down completely because of a catastrophic event.

The point is that all of this is possible and can happen at any time. Without a disaster recovery plan, you put the entire business operation at risk. You might save some time in the short-term, but long-term, this can have devastating effects.

How do you test a disaster recovery plan?

The only real way to test a disaster recovery plan is to simulate a disaster on production instances. You can always test a disaster recovery in a non-production environment, but you run the risk of not testing every single scenario in a real-world environment. And if a disaster does happen, you’re going to be crossing your fingers that there aren’t any failover problems.

The best way to simulate a disaster is to perform the failover using scheduled downtime. In some companies, they do this every six months, and some make this a quarterly schedule. This allows companies to test their disaster recovery plan in production and the ability to make changes to the plan if any problems arise during the switchover. And since it’s scheduled downtime, the IT teams minimize downtime and are now more prepared than ever.

How often should a disaster recovery plan be tested?

How often your disaster recovery plan gets tested depends on your risk profile and the type of applications and data you are hosting. If you’re a healthcare organization and the systems you provide are considered mission-critical, then once a month doesn’t seem too crazy.

I would recommend testing every six months as the bare minimum. I know quite a few organizations that don’t test at all, and that’s a big mistake. Disaster recovery plans should be simulated and executed often. You should also test your data backups and infrastructure failover plan, often.

How do Disaster Recovery and High Availability work in AWS or AZURE?

I’m going to answer this question from two perspectives: cost and technology.

In terms of cost, the simplicity and overall cost of cloud-based Disaster Recovery and High Availability in AWS or Azure are significantly less than the traditional costs of Hot DR and HA. Doing it the right way in a traditional environment costs a lot of money. And when money adds up, and you’re not actively using it on a daily basis, it becomes hard to justify the cost to purchase and maintain additional hardware.

Then there are the costs related to software licensing, support, and technical resources required to maintain the plan. In the cloud, you pay for what you use, that’s it.

That’s the cost-side. Now I’ll talk about the technology side and the benefits of doing this all in the cloud.

The biggest differentiator is that the cloud offers DR and HA services natively within their platforms. It can range from database recovery to file recovery or high-availability infrastructures. And, as mentioned before, cloud platforms operate in physical regions, so you can plan for multiple physical failures in multiple locations and still recover instantly.

The best part is that most Disaster Recovery and High Availability processes are automated. So, if a disaster does happen, the system will switchover to a hot standby immediately, without manual intervention. 

CIOs should start looking at Disaster Recovery and High Availability solutions in the cloud immediately

One of our healthcare services clients, focused on Health Information Exchange (HIE), had a disaster happen, but because they were already set up with Disaster Recovery and High Availability in the cloud, the switchover happened with zero business downtime.

The CIO gets an A+ in this area and is often congratulated on their success. Now is the time to investigate how the cloud could be a cost-efficient solution while enabling the enterprise-quality security and reliability of a traditional solution.