-Delay Part: Does the application have to support HACMP, or -is this done all at the OS level? Also, how does the -application know where the system went down? How does it -pick up from that?
HACMP is a set of scripts that runs on top of the OS and pretends to be you. What happens is this. When your machine goes down, hacmp signals the backup machine that it's going down and to prepare to take over. Then it runs your appliation shut down script (user written, by you), shuts down the application, unmounts the filesystems, vary's off the vg on the crashing machine.
It then swaps ip's, varys on the vg, mounts filesystems and starts up the application (with the application start up script, user written, by you), on the standby machine. The application has nothing to do with any of this, any more than it would if you decided to shut it down on one box and start it up on the other.
However there are lots of application related things that might prevent a good failover. Most of those are related to how the start up and shutdown scripts for the application. A lot of times, filesystems won't unmount (for example), because something's still running from inside one of them, instead of cding out of it. Fuser on AIX will NOT show you everything holding open a filesystem. You need to get a copy of LSOF and use that.
So good, careful planning in your implimentation will go a long way toward making everything work.
-Shutdown/DASD: When you issue a shutdown from your primary -node, do you just run "shutdown" or is there a HACMP -shutdown you run?
You tell hacmp what shutdown scripts to use and it runs them. After the application is down, hacmp runs it's own shutdown procedure and then aix finishing going down.
Remember that AIX doesn't just turn off when a machine crashes, unless you have an extremely old version. It does all sorts of things, including writting out a dump file and syncing the drives. Testing failover by yanging the power cord out of the wall not only does NOT simulate a normal crash (unless you lose power to the data center and in that case you dont' have any machines), it can also corrupt your fileystems on the machine you unplugged.
-Regarding the DASD/volume groups, how does it automatically -varyon the volumegroups? is there a heartbeat from the -rs232 cable, that says, "hey system going down...varyon -here"
you have the same vg imported to both machines, the primary and standby, but varied off, on the standby. When failover occurs, the hacmp demon runs a number of scripts, one of which reads through the resource groups you have configured, makes a note of each vg listed in the resource groups, and then runs the varyonvg command on each one of them.
You can set heartbeat up several ways, and if the primary machine stops sending heartbeat, eventualy the secondary machine will, yes, decide the primary is dead and start running it's startup routines. However in the case of a crash, the hacmp demon on the primary tells the hacmp demon on the secondary "i'm going down, take over", and the swap begins immediately without waiting for heartbeat.
-Thanks for your post again! -Much Much cleaner then -Redbooks
You're welcome
