So you don't have any means of a second heartbeating other than network? I think this what caused the nodes to go down because it was experiencing a split-brain cluster! When something goes wong in the network (even sometimes glitches) the cluster will think that the network is failing so each node will try to grap the disks (I think you are using external disks right? like san!?) so this will lead the cluster to be in a split-brain situation! and if you, by any change, having mirroring on the disks then each node might take a copy of the mirror and this might lead to a data corruption!?!
You got to have another means of heartbeating! think about using the current shared disks for that to avoid going thru all of that in the future!
Regards,
Khalid