Automatic RDS SQL Server Cross-Region Fail-Over for HA/DR

Automatic RDS SQL Server Cross-Region Fail-Over for HA/DR

Detecting RDS Unhealthy State

Leveraging CloudWatch, Route 53 HealthCheck and CLOUDBASIC API to Achieve Automatic Cross-Region Fail-Over

RDS SQL Server Disaster Recovery (DR) and High Availability (HA) with CLOUDBASIC Multi-AR™

Published: October 7, 2018


The overall strategy in this guide is based on the following high-level workflow:

"An RDS CloudWatch alarm goes into INSSUFICIENT DATA state" because the monitored RDS instance goes down

-> This condition triggers a "Route53 Health Check to FAIL"

-> This then triggers "A ROUTE 53 Health Check alarm to go into ALARM state"

-> This causes a notification to be send to an SNS topic

-> This triggers LAMBDA functions that are subscribers to the SNS topic to execute

-> The LAMBDA functions call CloudBasic API methods to configure the secondary RDS for Primary duties (activation of constraints, triggers, etc,) and to switch the Route53 record to point to the new Primary RDS instance


Here is how to setup the individual components needed for this workflow:

1. Setup a CloudWatch alarm for your RDS instance

a. Select the CPUUtilization metric and configure to look for condition of CPUUtilization > 100 %
b. Configure to look for "1 out of 1 datapoints"
c. Period to be "1 minute"
d. Configure Statistic to be "Standard" and select "Average"

Note: The goal is to configure a CloudWatch alarm that will never be triggered based on the triggering condition.

2. Setup a Route53 Health check to monitor the RDS CloudWatch alarm

a. In the "What to monitor" select "State of CloudWatch" alarm
b. Under the "Monitor CloudWatch alarm" section select the AWS Region and the name of your RDS CloudWatch alarm
c. VERY IMPORTANT in the "Health check status" section, for the "When the alarm is in the INSUFFICIENT state" select the "the status is unhealthy" option

 

3. Set up a Route53 Health check alarm

a. Select the Route53 Health check you created in the last step
b. In the Alarms tab click on "Create alarm"
c. Under "Send notification" select "Yes"
d. Create a new SNS topic
e. Click on "Confirm"

 

4. Setup your Lambda functions as subscribers to the SNS topic

a. In the SNS service select the SNS topic you created in the previous step
b. In the "Subscriptions" section click the "Create subscription" button

i. Under "Protocol" select "AWS Lambda"
ii. Under "End point" select the Lambda function you would like to call

 

In CloudBasic's GitHub repository (https://github.com/cloudbasic), you can find sample code of a Lambda function that calls the CloudBasic API to promote a read-replica DB to primary:

https://github.com/cloudbasic/Lambda-Promote-to-Primary

Here is the code of a sample Lambda function that switches a Route53 record:

In CloudBasic's GitHub repository (https://github.com/cloudbasic), you can find sample code of a Lambda function that calls Amazon Route 53's API to switch DNS records:

https://github.com/cloudbasic/Lambda-Update-Route53