Fault Tolerant VPN Solution on AWS


I worked with a project team to help them to improve their current VPN infrastructure on AWS. They have 3 VPN EC2 instances, let’s call them VPN01, VPN02 and VPN03. They are all OpenVPN Access Server, VPN01 and VPN02 both have 10 concurrent sessions license, and in availability a and b respectively. VPN03 only has the 2 complimentary concurrent session license, and it is availability zone c  (it is mostly for emergency use, e.g both AZ-a and AZ-b go down). There is a DNS round robin setting and all three instances have the same configurations, so the end user can dial in any of them. Here are the configuration files:

/usr/local/openvpn_as/etc/db/certs.db
/usr/local/openvpn_as/etc/db/config.db
/usr/local/openvpn_as/etc/db/userprop.db

They just renewed the license, so I have to stick with the current license-based AMI. Otherwise I will use the hourly-rated OpenVPN AMI with ELB and Autoscaling group. As VPN01 and VPN02 have more license, the solution need to make most users to use those two instances. And if the VPN service is not working properly on one instance, the solution needs to divert the user to the healthy instance.

With the requirements in mind, here is my design:

Fault_Tolerant_VPN_solution_on_AWS.png

I guess the architecture diagram is self-explanatory. Below are some brief description of how I implemented it:

  1. Setup weighted DNS CNAME records for vpn.mydomain.com of  vpn01.mydomain.local (weight 45), vpn02.mydomain.local (weight 45) and vpn03.mydomain.local (weight 10). So there are 45% chances the traffics go to either vpn01 or vpn02, only 10% go to vpn03.
  2. Setup DNS health check for each vpn(01|02|03).mydomain.local. As OpenVPNAS is SSL VPN, we only need to monitor the port 443.Screen Shot 2017-01-16 at 9.02.53 AM.png
  3. Create a new SNS topic, Let’s name it to vpn_healthcheck.
  4. Configure the alarm notification target to a new SNS, so a notification will be sent to SNS if the health check failed.Screen Shot 2017-01-16 at 9.04.34 AM.png
  5. Let’s work on the Lambda function. Firstly you need to setup a role for the function to perform the start or reboot operations. Here is a sample code. Secondly, set up a SNS trigger type Lambda function. I use Python, and here is the source code.Screen Shot 2017-01-16 at 9.44.13 AM.pngScreen Shot 2017-01-16 at 9.07.58 AM.png
  6. Go back to the SNS that is created in step 3, and subscribe it with your email. And subscribe for the Lambda function as well.Screen Shot 2017-01-16 at 9.06.03 AM.png
  7. Testing time – stop the openvpnas service on one of the VPN instance. And wait for 1-2 minute, the instance will be reboot by the Lambda function.Screen Shot 2017-01-16 at 8.55.25 AM.pngCheck the Lambda function log:

    Screen Shot 2017-01-16 at 9.42.08 AM.png

Hope you find it is useful for you. All sample codes can be found in my Github repo.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s