pavement

BIND, dynamic DNS, failover A records

From FreeBSDwiki
Jump to: navigation, search

Contents

The problem: inexpensive but unreliable ISPs

If you've got a multi-homed network with multiple IP addresses from different ISPs, but you aren't a big enough organization to convince your ISPs to build BGP routes to connect to each other at your network, you will probably find it really handy to have a single DNS record that will automatically choose the best way to get to your network from the outside world.

In this example, "BSDcompany" runs a small office network (office.bsdcompany.com) and a server in a colocated network facility (coloserver.bsdcompany.com). Frequently, they need to access network resources inside the office from the internet. Since neither of the two ISPs available at BSDcompany's office are particularly reliable, BSDcompany has a cable modem from one of them, a DSL modem from the other, and a dual-WAN router. Both the cable and the DSL use dynamic IP addresses, and the company already has a server in the office doing dynamic DNS updates to cable-ip.office.bsdcompany.com and dsl-ip.office.bsdcompany.com.

BSDcompany's dual-WAN router provides load balancing and automatic failover redundancy for internet access from within the office. But BSDcompany wants similar redundancy and balancing from the outside coming in as well. So instead of randomly trying cable-ip.office.bsdcompany.com and dsl-ip.office.bsdcompany.com to see which (if either) is working at any particular time, they just want to be able to use a single name all the time and have it automatically take them to whichever ISP is up and/or faster at the moment.

The solution: ddns-failover.pl (another freebsdwiki.net original)

BSDcompany decides to set up a cron job on their colo server to check the status and latency of each of their office WAN IPs. That script will then automatically update a third A record, office.bsdcompany.com, with whichever is currently the quicker of the two office WANs to respond - and if both WANs are down, it will delete the record entirely until one or the other of them comes back up.

(Like the set-ddns.pl script in the previous dynamic DNS article, the variables ddns-failover.pl in UPPERCASE are things you should set to match your own situation, while the ones in lower or mixed case are generally things you shouldn't need to mess with.)

#!/usr/bin/perl

# ddns-failover.pl
#
# Copyright (c) 05-20-2006, JRS System Solutions
# All rights reserved under standard BSD license
# details: http://www.opensource.org/licenses/bsd-license.php
#
# Check each of two public IPs for the same multi-homed host,
# and set a dynamic DNS A record to point to the lower latency
# of the two.  If both routes are down, delete the hostname
# entirely until one or both IPs come back up.

$WANDNS1 = 'cable-ip.office.bsdcompany.com';
$WANDNS2 = 'dsl-ip.office.bsdcompany.com';
$HOSTNAME = 'office.bsdcompany.com';
$NAMESERVER = 'coloserver.bsdcompany.com';
$KEYFILE = 'Koffice.bsdcompany.com.+157+15661.private';
$KEYDIR = '/usr/home/ddns';
$TTL = '10';

@wan1 = split(/\n/,`/sbin/ping -qc 1 -t 1 $WANDNS1`);
@wan2 = split(/\n/,`/sbin/ping -qc 1 -t 1 $WANDNS2`);

$wan1[0] =~ /\((\d*?\.\d*?\.\d*?\.\d*?)\)/;
$wan1_ip = $1;
if ($wan1_ip == '') { $wan1_ip = 'NO HOST FOUND'; }
$wan2[0] =~ /\((\d*?\.\d*?\.\d*?\.\d*?)\)/;
$wan2_ip = $1;
if ($wan2_ip == '') { $wan2_ip = 'NO HOST FOUND'; }

$wan1[3] =~ /(\d*?) packets received/;
$wan1_rcvd = $1;
$wan2[3] =~ /(\d*?) packets received/;
$wan2_rcvd = $1;

$wan1[4] =~ /\/(\d*?\.\d*?)\//;
$wan1_time = $1;
$wan2[4] =~ /\/(\d*?\.\d*?)\//;
$wan2_time = $1;

if ($wan1_rcvd != 1 && $wan2_rcvd == 1) {
        print "WAN1 [$wan1_ip]: NO RESPONSE\nWAN2 [$wan2_ip]: $wan2_time" . "ms\nSET $HOSTNAME: WAN2\n";
        $dnsip=$wan2_ip;
} elsif ($wan1_rcvd == 1 && $wan2_rcvd != 1) {
        print "WAN1 [$wan1_ip]: $wan1_time" . "ms\nWAN2 [$wan2_ip]: NO RESPONSE\nSET $HOSTNAME: WAN1\n";
        $dnsip=$wan1_ip;
} elsif ($wan1_rcvd != 1 && $wan2_rcvd !=1) {
        print "WAN1 [$wan1_ip]: NO RESPONSE\nWAN2 [$wan2_ip]: NO RESPONSE\nDELETE $HOSTNAME\n";
        $dnsip='NO';
} elsif ($wan1_time <= $wan2_time) {
        print "WAN1 [$wan1_ip]: $wan1_time" . "ms\nWAN2 [$wan2_ip]: $wan2_time" . "ms\nSET $HOSTNAME: WAN1\n";
        $dnsip=$wan1_ip;
} else {
        print "WAN1 [$wan1_ip]: $wan1_time" . "ms\nWAN2 [$wan2_ip]: $wan2_time" . "ms\nSET $HOSTNAME: WAN2\n";
        $dnsip=$wan2_ip;
}

chdir ($KEYDIR);
open (NSUPDATE, "| /usr/sbin/nsupdate -k $KEYFILE");
print NSUPDATE "server $NAMESERVER\n";
print NSUPDATE "update delete $HOSTNAME A\n";
if ($dnsip ne 'NO') {
        print NSUPDATE "update add $HOSTNAME $TTL A $dnsip\n";
}
# print NSUPDATE "show\n";
print NSUPDATE "send\n";
close (NSUPDATE);

Setting up permissions

To minimize security risks, the gurus at BSDcompany create a new user named "ddns", put this script and the copies of the key files for the zone (which they already had, when they set up their dynamic DNS earlier) in the "ddns" user's home directory, and make sure to set the permissions on everything as restrictively as possible before setting up the cron job to actually run it.

coloserver# pw useradd ddns -s /sbin/nologin -d /usr/home/ddns
coloserver# mkdir /home/ddns
coloserver# cp /etc/namedb/zones/keys/Koffice.bsdcompany.com.+157+15661.private .
coloserver# cp /etc/namedb/zones/keys/Koffice.bsdcompany.com.+157+15661.key .
coloserver# chmod 400 Koffice.bsdcompany.com.+157+15661.*
coloserver# chmod 500 ddns-failover.pl
coloserver# ls -l
-r--------  1 ddns  wheel   130 May 20 12:22 Kph34r.tehinterweb.net.+157+23266.key
-r--------  1 ddns  wheel   145 May 20 13:17 Kph34r.tehinterweb.net.+157+23266.private
-r-x------  1 ddns  wheel  3108 May 23 01:27 ddns-failover.pl
coloserver# 'su ddns
This account is currently not available.

Excellent: the ddns account is present but cannot be interactively logged into, the key files are readable (but not writeable or executable) only to it, and the script is executable (but not writeable) only to it. Now that the permissions are correct, it's time to do a test run - we'll run the script manually (using sudo to do so as the user ddns, just like the cron job will) before we set it up to run automatically.

Testing the script manually

coloserver# sudo -u ddns /usr/bin/perl /usr/home/ddns/ddns-failover.pl
WAN1 [128.32.64.5]: 94.302ms
WAN2 [144.69.42.18]: 85.341ms
SET office.bsdcompany.com: WAN2
coloserver# ping -qc 1 office.bsdcompany.com
PING office.bsdcompany.com (144.69.42.18): 56 data bytes

--- office.bsdcompany.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 85.038/85.038/85.038/0.000 ms

Perfect! Now, the BSDcompany folks force an apparent fail condition on WAN2 to make sure it fails over properly:

coloserver# nsupdate -k Koffice.bsdcompany.com.+157+15661.private
> update delete dsl-ip.office.bsdcompany.com
> send
> quit
coloserver# sudo -u ddns /usr/bin/perl /usr/home/ddns/ddns-failover.pl
WAN1 [128.32.64.5]: 98.213ms
WAN2 [NO HOST FOUND]: NO RESPONSE
SET office.bsdcompany.net: WAN1
coloserver# ping -qc 1 office.bsdcompany.com
PING office.bsdcompany.com (128.32.64.5): 56 data bytes

--- office.bsdcompany.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 97.188/97.188/97.188/0.000 ms

Good! Now they force WAN1 to apparently fail at the same time, to ensure the record is deleted entirely if both WANs go down (so you get an immediate failure response if both WANs are down, instead of having to wait for a pointless and possibly very lengthy network timeout first):

coloserver# nsupdate -k Koffice.bsdcompany.com.+157+15661.private
> update delete cable-ip.office.bsdcompany.com
> update delete dsl-ip.office.bsdcompany.com
> send
> quit
coloserver# sudo -u ddns /usr/bin/perl /usr/home/ddns/ddns-failover.pl
WAN1 [NO HOST FOUND]: NO RESPONSE
WAN2 [NO HOST FOUND]: NO RESPONSE
DELETE office.bsdcompany.net
coloserver# ping -qc 1 office.bsdcompany.com
ping: cannot resolve office.bsdcompany.com: Unknown host

Outstanding.

Installing and running the crontab

Now that we've thoroughly tested our script, we can set it up as a crontab to run once per minute, just like the crontab that runs set-ddns.pl on the server inside the office to update cable-ip and dsl-ip.

coloserver# crontab -u ddns -e
* * * * * /usr/bin/perl /usr/home/ddns/ddns-failover.pl > /dev/null

To be completely thorough, now we'll break the record one last time and let the cron job fix it behind us:

coloserver# nsupdate -k Koffice.bsdcompany.com.+157+15661.private
> update delete office.bsdcompany.com A
> send
> quit
coloserver# date
Tue May 23 04:17:48 EDT 2006
coloserver# ping -qc 1 office.bsdcompany.com
ping: cannot resolve office.bsdcompany.com: Unknown host
coloserver# date
Tue May 23 04:18:03 EDT 2006
coloserver# ping -qc 1 office.bsdcompany.com
PING office.bsdcompany.com (144.69.42.18): 56 data bytes

--- office.bsdcompany.com ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 86.338/86.338/86.338/0.000 ms

As soon as the minute ticked over, our crontab fired up and did what it's supposed to. So now that we've thoroughly tested both the script and the tab that runs it, we can forget about it and just use our failover A record without having to think about it anymore.

Personal tools