AWS Disaster Recovery Strategies

Recovery Time Objective (RTO):

What is Recovery Time Objective (RTO)?

Recovery Time Objective (RTO) is the maximum amount of time a service can remain unavailable before causing damage to the business.

How to Calculate Recovery Time Objective (RTO)?

To calculate RTO, you can take the time the system went down and subtract the time it was recovered. For example, if the system went down at 2 pm and was recovered by 6 pm, the RTO would be calculated as 4 hours.

Recovery Point Objective (RPO):

What is Recovery Point Objective (RPO)?

Recovery Point Objective (RPO) is the maximum amount of time for which data can be lost if a system fails.

How to Calculate Recovery Point Objective (RPO)?

RPO can be calculated by taking the time of the last backup before the system went down and subtracting it from the time the system went down. For example, if the last backup was taken at 12 pm and the system went down at 2 pm, the RPO would be calculated as 2 hours.

Choosing the Right Architecture:

How to Choose the Right Architectural Solution for Disaster Recovery?

The right architecture and data backup solution for disaster recovery should be based on the level of RPO and RTO the application can support without causing damage to the business. It is important to ensure that the RPO and RTO are in line with the requirements of the business.

Different disaster recovery strategies

Backup and Restore

Backup and restore is a disaster recovery strategy that allows you to take frequent snapshots of your data stored in EBS volumes and RDS databases and store these snapshots in a reliable storage space such as AWS S3. This strategy can be used to protect both AWS applications and on-premise applications. The backup and restore strategy is the slowest of the disaster recovery strategies, and it is best used in conjunction with other strategies. By storing backup data in AWS Glacier, the cost of this strategy can be further reduced.

Recovery Time Objective (RTO): High (e.g. 10-24 hours).

Recovery Point Objective (RPO): Depends on the frequency of the backups, which can be hourly, 3 hourly, 6 hourly, or daily.

Pilot Light

Pilot light is a disaster recovery strategy that involves running a minimal version of the production environment on AWS. This does not mean that the entire application is scaled down, but rather that only the core and most critical components of the production environment are configured and running. When disaster strikes, an entire full-scaled application can be rebooted around the running core. Pilot light is more costly than backup and restore, as some minimal AWS services are always running. This strategy also involves provisioning infrastructure using cloud scripts such as AWS CloudFormation scripts, for an efficient and quick restoration of the system.

Recovery Time Objective (RTO): High, but less than backup and restore (e.g. 5-10 hours).

Recovery Point Objective (RPO): Same as RPO for Backup and Restore, i.e. depends on the frequency of the backups.

Warm Standby

The warm standby strategy involves running an extremely scaled-down, yet fully functional version of the production environment on AWS. This includes provisioning infrastructure using cloud scripts such as AWS CloudFormation scripts, for an efficient and quick restoration of the system. When disaster strikes, the warm standby application can be quickly scaled up to serve as the production application. EC2 servers can be left running to a minimal number and server type, and can be scaled up to serve as a fully functional application using AWS AutoScaling features. In addition, all DNS records and traffic routing tables must be changed to point to the standby application rather than the production application.

Recovery Time Objective (RTO): Lower than Pilot Light (e.g. < 5 hours).

Recovery Point Objective (RPO): Same as RPO for Backup and Restore, i.e. depends on the frequency of backups.

Multi-Site

The multi-site strategy involves running a fully functional version of the production environment as a backup in the cloud. This is a one-to-one copy of your primary application, typically run in a different Availability Zone or an entirely different region, for durability. This is the most expensive of all the disaster recovery options, as it doubles your running costs for running a single application. However, it offers the lowest RTO and RPO of any of the strategies. As soon as failure strikes, developers only need to change DNS records and routing tables to point to the secondary application.

Recovery Time Objective (RTO): Lowest of all DR strategies (e.g. < 1 hour).

Recovery Point Objective (RPO): Lowest of all DR strategies. Choice of data replication affects RPO; the last data is written in a synchronous database.

Recovery Time Objective (RTO):

How to Calculate Recovery Time Objective (RTO)?

Different disaster recovery strategies

Warm Standby

Leave a Reply Cancel reply