Disaster Recovery/Fail-Over Architecture (AWS DR)
There are several ways you can take advantage of Active Geo-Replication when designing your service for business continuity and implement AWS DR. The choice depends on several factors, but the main purpose is to optimize for specific application patterns. Other factors include ease of failover management, service level agreement, and traffic latency and costs.
One option is Active-Passive compute with coupled failover
This option is best suited for applications with a single active deployment per geographic region. The application requires colocation of the compute tier and the data tier, this may be because of a chatty interface between the business logic and the database. It may also be critical to avoid additional traffic cost between the two tiers. For geo-redundancy, the database is geo-replicated to another region and the compute tier is deployed there also. Connections to the MS SQL database are pre-configured for each location.
The following figure illustrates the service configuration before the failure. In this case all user traffic is directed to the active compute deployment in the primary location.
The following figure illustrates the service configuration after a failure. As shown in the figure, the compute deployment and the back-end database remain co-located, but are now in the secondary location.
This strategy has the following advantages and trade-offs:
The SQL connection can be statically pre-configured in each region.
The entire service (all tiers) is treated as a single failure domain. This means that a failure of any part of the service results in the failover of the entire service.
The failover results in some data loss.
RPO varies depending on EC2 instance type and number of concurrently replicated databases and other factors (from 2 seconds to a few mins depending on number of tables, number of databases replicated per CLOUDBASIC server, replication mode /i.e. replication latency will increase if change tracking processes are serialized to avoid server overloading, in the cases where a smaller size CLOUDBASIC server is used to lower EC2 cost/)
RTO = DNS record change + database state change + application verification test