Automated Cross-Region Disaster Recovery on Amazon RDS for SQL Server STD, ENT & Web
Using CLOUDBASIC for RDS SQL Server Read Replicas and AWS Lambda
Primary Region RDS SQL Server unhealthy state detection will trigger execution of the Lambda function, which will call CLOUDBASIC HA Cluster’s API & Route53’s API, to promote DR replica databases to primary, and reroute web traffic to the DR standby Web Apps clone respectively.
The overall strategy in this guide is based on the following high-level workflow:
"An RDS CloudWatch alarm goes into INSSUFICIENT DATA state" because the monitored RDS instance goes down
-> This condition triggers a "Route53 Health Check to FAIL"
-> This then triggers "A ROUTE 53 Health Check alarm to go into ALARM state"
-> This causes a notification to be sent to an SNS topic
-> This triggers LAMBDA functions that are subscribers to the SNS topic to execute
-> The LAMBDA functions call CloudBasic API methods to configure the secondary RDS for Primary duties (activation of constraints, triggers, etc,) and to switch the Route53 record to point to the new Primary RDS instance
Here is how to setup the individual components needed for this workflow:
1. Setup a CloudWatch alarm for your RDS instance
Select RDS instance, got to the Logs & events section, click Create Alarm:
a. Select the CPUUtilization metric and configure to look for condition of CPUUtilization > 100 %. The goal is to configure a CloudWatch alarm that will never be triggered based on the triggering condition.
b. Configure to look for "1 out of 1 datapoints"
c. Period to be "1 minute"
d. Configure Statistic to be "Standard" and select "Average"
Note: The goal is to configure a CloudWatch alarm that will never be triggered based on the triggering condition.
2. Setup a Route53 Health check to monitor the RDS CloudWatch alarm
a. In the "What to monitor" select "State of CloudWatch" alarm
b. Under the "Monitor CloudWatch alarm" section select the AWS Region and the name of your RDS CloudWatch alarm
c. VERY IMPORTANT in the "Health check status" section, for the "When the alarm is in the INSUFFICIENT state" select the "the status is unhealthy" option
d. Click Next, proceed to step 2 of 2 below, to create Alarm for this Route53 health check.
3. Set up a Route53 Health check alarm
a. In step 2 of 2 above, create an Alarm for the Route53 HealthCheck.
b. Select "Create alarm", "Yes"
c. Under "Send notification to" select "Existing SNS topic" select the previously created topic or create a new one.
d. Click on "Create health check"
4. Setup your Lambda functions as subscribers to the SNS topic
a. In the SNS service select the SNS topic you created in the previous step
b. In the "Subscriptions" section click the "Create subscription" button
i. Under "Protocol" select "AWS Lambda"
ii. Under "End point" select the Lambda function you would like to call
In the lambda function, under SNS, click Add Trigger:
Create the trigger by selecting SNS, then the respective topic:
The new trigger will be listed under the SNS section:
You can monitor for the lambda function invocations under function's dashboard:
5. Test the fail-over scenario:
The easiest way to test is to shut down the RDS instance:
In CloudBasic's GitHub repository (https://github.com/cloudbasic), you can find sample code of a Lambda function that calls the CloudBasic API to promote a read-replica DB to primary:
This is a code snippet of a sample Lambda function that calls CLOUDBASIC's API and Amazon Route53's API to change DNS records and promote RDS SQL Server DR replicas to primary:
In CloudBasic's GitHub repository (https://github.com/cloudbasic), you can find sample code of a Lambda function that calls Amazon Route 53's API to switch DNS records: