I've been experimenting with multiple A records for both load-distributing AND high availability.
Up until this point I was always told that round-robin is for load-distributing ONLY and should not be used for high availability failover. But in practice this is not proving to be true. I'm beginning to think that was just FUD.
Do a lookup on roundrobintest8.strangled.net and roundrobintest9.strangled.net. Notice the A records:
roundrobintest8.strangled.net. 3600 IN A 127.0.0.1
roundrobintest8.strangled.net. 3600 IN A 63.95.68.129 # Real server
roundrobintest9.strangled.net. 3600 IN A 10.69.96.69 # Bogus IP
roundrobintest9.strangled.net. 3600 IN A 63.95.68.129 # Real server
Now, disable anything running on localhost:443 and make sure you do *not* have a host at 10.69.96.69.
Browse and You should never get a DNS error. It should always give you first an SSL warning (hostname mismatch) and login prompt. Oh it'll pause while it tries the bad IP but after about 5 seconds it flips to the real server.
Now load up an SSL web server on localhost. I used Apache+mod_ssl on Linux and TinySSL on Windows. Set up an index page with links to several other pages.
(Sorry to require SSL, it was the only web server I have control over that no one is using at the moment, so I can kill the web service any time I want... You could also load up an FTP or SSH server on localhost instead of SSL. My server has all three.)
Flush your cache (e.g. ipconfig /flushdns) and reload the website. Sometimes you will get localhost, sometimes my server. That's the load-distributing action we all know and love.
If you don't get localhost, keep flushing your cache until you get it. Then kill your server and click on a link in the web page that is still up on your screen. It will fail back to my server and generate a 404. That's high availability! Even though it generates an error, it's coming from my server nonetheless!
-No- client I've tried (browser, FTP client, MySQL, SSH etc.) fails on the bad IP (10.69.96.69). It thinks for a few seconds and then tries the good IP.
Nor does it fail when the IP is good, as in the case of localhost, but no service is listening on that port.
I've tried this on:
Windows 95
Windows 98
Windows 2000
Windows XP
Ubuntu 6.06
Debian 3.1
CentOS 3
CentOS 4
With these clients:
Netscape 4.5 (Nice and old!!!)
IE 5.5
IE 6
Firefox 1.0
Firefox 1.5
DOS FTP
Linux FTP
Linux NcFTP
MySQL client
OpenSSH client
My idea is to set up a live server running web/mail/DNS/DB/FTP and a warm standby, such as:
3600 IN A 1.1.1.1
3600 IN A 2.2.2.2
The warm standby is powered on but no services are started. Live is synchronized to warm standby. If the live fails I bring up the standby. Bing bang boom, the client automatically goes to the standby.
It'll be just web/POP/SSH/FTP because DNS and SMTP already have built-in load-distributing and high availability capabilities. No database ports will be exposed to the outside world but if I do they should work.
If this works, so cool! Replacement for expen$ive and complicated HA solutions
Was clued into this by Mr. Tenereillo:
What am I missing? Do I need to do more testing?
Am I crazy? Or crazy like a fox? ;-)
Someone check me on this because I'm not sure I'm testing it right...
CD
R U good enough?
Up until this point I was always told that round-robin is for load-distributing ONLY and should not be used for high availability failover. But in practice this is not proving to be true. I'm beginning to think that was just FUD.
Do a lookup on roundrobintest8.strangled.net and roundrobintest9.strangled.net. Notice the A records:
roundrobintest8.strangled.net. 3600 IN A 127.0.0.1
roundrobintest8.strangled.net. 3600 IN A 63.95.68.129 # Real server
roundrobintest9.strangled.net. 3600 IN A 10.69.96.69 # Bogus IP
roundrobintest9.strangled.net. 3600 IN A 63.95.68.129 # Real server
Now, disable anything running on localhost:443 and make sure you do *not* have a host at 10.69.96.69.
Browse and You should never get a DNS error. It should always give you first an SSL warning (hostname mismatch) and login prompt. Oh it'll pause while it tries the bad IP but after about 5 seconds it flips to the real server.
Now load up an SSL web server on localhost. I used Apache+mod_ssl on Linux and TinySSL on Windows. Set up an index page with links to several other pages.
(Sorry to require SSL, it was the only web server I have control over that no one is using at the moment, so I can kill the web service any time I want... You could also load up an FTP or SSH server on localhost instead of SSL. My server has all three.)
Flush your cache (e.g. ipconfig /flushdns) and reload the website. Sometimes you will get localhost, sometimes my server. That's the load-distributing action we all know and love.
If you don't get localhost, keep flushing your cache until you get it. Then kill your server and click on a link in the web page that is still up on your screen. It will fail back to my server and generate a 404. That's high availability! Even though it generates an error, it's coming from my server nonetheless!
-No- client I've tried (browser, FTP client, MySQL, SSH etc.) fails on the bad IP (10.69.96.69). It thinks for a few seconds and then tries the good IP.
Nor does it fail when the IP is good, as in the case of localhost, but no service is listening on that port.
I've tried this on:
Windows 95
Windows 98
Windows 2000
Windows XP
Ubuntu 6.06
Debian 3.1
CentOS 3
CentOS 4
With these clients:
Netscape 4.5 (Nice and old!!!)
IE 5.5
IE 6
Firefox 1.0
Firefox 1.5
DOS FTP
Linux FTP
Linux NcFTP
MySQL client
OpenSSH client
My idea is to set up a live server running web/mail/DNS/DB/FTP and a warm standby, such as:
3600 IN A 1.1.1.1
3600 IN A 2.2.2.2
The warm standby is powered on but no services are started. Live is synchronized to warm standby. If the live fails I bring up the standby. Bing bang boom, the client automatically goes to the standby.
It'll be just web/POP/SSH/FTP because DNS and SMTP already have built-in load-distributing and high availability capabilities. No database ports will be exposed to the outside world but if I do they should work.
If this works, so cool! Replacement for expen$ive and complicated HA solutions
Was clued into this by Mr. Tenereillo:
What am I missing? Do I need to do more testing?
Am I crazy? Or crazy like a fox? ;-)
Someone check me on this because I'm not sure I'm testing it right...
CD
R U good enough?