Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Opinion of DoubleTake or Veritas Cluster with Exchange 2003?

Status
Not open for further replies.

PScottC

MIS
Mar 16, 2003
1,285
US
Has anyone on this forum used a bit-level clustering application like DoubleTake with Exchange?

1) If yes, what was your opinion of the product?

2) Was it easy to install / configure?

3) Does it fail-over properly?

4) Why did you choose this type of cluster over Microsoft Clustering?

PSC

Governments and corporations need people like you and me. We are samurai. The keyboard cowboys. And all those other people out there who have no idea what's going on are the cattle. Mooo! --Mr. The Plague, from the movie "Hackers
 
Doubletake is the replication piece. Veritas provides both the replication piece and the clustering piece. Microsoft (an MNS cluster) provides the clustering piece.


If you're looking at a product or combination of product in this area, I'm assuming that you are trying to build a geographically dispersed cluster. In building a geographically dispersed vcluster, realize that the hardware is certified as a complete system by Microsoft, and published in the HCL. This means servers, switches, replication software, clustering software, etc. If you change even one component, it's no longer HCL certified. If you're looking at a piecemeal system, then you need to fcist decide if supportability by Microsoft is important to you or not. If it is, choose a system off the HCL. If it is not, then I think the place to start would be a long talk with your storage vendor.

 
I've built several MSCS based exchange and SQL clusters and I started this thread to get opinions on other options.

A company that I do consulting for has asked me to suggest a disaster recovery solution. Simply put, they want to have all of their MB's immediately available at another physical location in case their data center were destroyed. They are currently configured with a centralized Exchange supporting about 20 sites.

I've done some research on DoubleTake and also Legato's CoStandby Server and I want to know if anyone's had experience with any products like them. These products claim that they can mirror servers (including exchange) on dis-similar hardware in geographically dispersed locations. Is it true? Do they work?

PSC

Governments and corporations need people like you and me. We are samurai. The keyboard cowboys. And all those other people out there who have no idea what's going on are the cattle. Mooo! --Mr. The Plague, from the movie "Hackers
 
I've used doubletake. It works, however it's only replication, not a geocluster. NSI sells geocluster, and I've used that as well. It works. Both of these product, or any geoprahically dispersed solution, depends on replication over a WAN link. I've also used perabit here. I've done a half dozen clusters with the veritas solution, and it does work, albeit a little cumbersome on the failover/failback.

Recently, I implemented yet another geographically dispersed Exchange cluster. This time, the hardware vendor (Netapp) handled the replication piece (syncmirror) and I used a Windows 2003 Majority Node Set cluster. I used iSCSI and the MS initiator on all four nodes to connect to the storage. The failover behavior in MNS takes a bit of getting used to, but that solution worked just fine.

For the most part, the systems were not on the Microsoft cluster HAL as geographically dispersed clusters. In most instances it was due to swapping server brands or HBAs. It's not that the mix of products can't work, it's just that unless the exact configuration as a whole is on the HCL, support from MS could be problematic.

In your case, it looks like the customer doesn't really want a geocluster, they just want replication to a remote site with a cold standby server. You could use doubletake, or volume replicator for that matter, but the issue will be the size of the change delta. That's basically the amount of data you'll have to ship across the WAN. You might want to look at WAN acceleration technologies like perabit also.

BTW, who is the storage vendor? If you do go Netapp, then the answer would be to use snapmirror to storage in the remote location. In a failure, you break the replication links, attach the LUNs to your cold standby and off you go. It really works well if you boot from SAN also. If you want to do this with iSCSI, you will need a TOE card and not the MS software initiator.

 
Sorry to take so long getting back to this thread.

I've finally got more details on how this will be used...

It will be something like what xmsre stated. The live servers will be in one location and a cold backup will be offsite. In the instance of a disaster, the DR site would be brought online. As of today, the plan is to have a 10MB pipe between the current Data Center and the DR site. I expect that this would be sufficient connectivity to satisfy an application like doubletake. This company is using a VOIP solution with unified messaging attached to the Exchange system. The company feels that it is critical to have Voice / Voicemail services available immediately if a disaster were to happen at the data center.

Now my understanding is that Doubletake and some of these other mirroring/replication solutions do bit-level copies of the changes from the hot to the cold server. Is that correct? Are there any automated fail-over capabilities? If fail-over is manual, what is the time frame? 10 minutes? 45 minutes?

If I wanted to do a High Availability cluster, like MSCS, in the data center can I also use one of these other products for a standby server?

Thanks for your thoughts.

PSC

Governments and corporations need people like you and me. We are samurai. The keyboard cowboys. And all those other people out there who have no idea what's going on are the cattle. Mooo! --Mr. The Plague, from the movie "Hackers
 
No, no failover with doubletake. There's a seperate NSI product called geocluster for the failover piece.

Arizona State University has a distance cluster solution in place that supports 10,000 staff and faculty users. Their storage is NetApp, so they actually use the syncmirror product to replicate the data between volumes on two clustered filer heads (it's two buildings on the same campus. If the distance were greater they would have to use a product like Metrocluster). On top of that are two 4 node Majority Node Set clusters with two actives and two passives running Exchange each. The connection between the hosts and the filers is iSCSI. Failover, although not automatic (in an MNS cluster with an even number of nodes split across two sites, you have to start the cluster services in the DR site manually... a majority is never reached), the time to failover manually is on the order of two minutes. That's one end of the spectrum.

With Veritas, you could do something similar with veritas cluster and volume replicator. It's been a couple of years since of done one like this, but the failover times were certainly in the sub 10 minute range and I would expect them to have improved over the years. Likewise, doubletake could do the replication and you could use geocluster for the failover.


The problem with all of these will be the replication traffic. A 10m pipe is pretty small. You absolutely want synchronous replication for the logs, and if possible for the databases. If you go asynch for the databases, this increases your failover time due to log replay. You'll definitly want to define what your change deltas are, and do the math to determine if the pipe will work. You may also want to consider WAN acceleration products.



If you go the cold standby route, you will risk losing some data. I guess it depends on what your SLA will tolerate. If we stick with Netapp, the answer would be to set up snapmirror relationships to a filer in the remote data center. After the inital baseline copy, only changes are replicated. To get an idea of how much data you would be pushing in a 24 hour period, check the number of logs you generate in 24 hours * 5M * 2. With smapmirror you can mirror the database changes on say a 4 hour schedule and the logs on a different schedule, say every 30 minutes. In the event of a failure of the primary site, you could lose up to the last 30 minutes of mail in this scenario. You would need a live DC/GC/DNS located at the DR site, and would have to figure in the AD replication traffic on the connection. You would also need your cold standby up and loaded with W2K3 and SPs/hotfixes. In the event of a failure of the primary site you would:

1. Break the snapmirror replication links.
2. Change the name of the standby server to the same as the primary.
3. reset the computer account of the primary server in AD
4. Join the cold standby to the domain
5. Attach to the LUNs on the filer in the alt datacenter.
6. Do a Disasterrecovery install of exchange on the cold standby
7. If you redirect ports at the firewall by IP address, change firewall rules.

My experience has been that, with practice, this process takes about an hour.

Hope this helps

XMSRE



 
Thanks for the comments. I haven't been asked to do this type of disaster recovery before, so I'm on a learning curve here. I'll start researching some of the products you are using and have used.

This is a rather small company at about 400 users and I think that the total mail passed in a single day would be <100MB. But I expect that in the next 5 years the number of users will double or more.

PSC

Governments and corporations need people like you and me. We are samurai. The keyboard cowboys. And all those other people out there who have no idea what's going on are the cattle. Mooo! --Mr. The Plague, from the movie "Hackers
 
I'm also looking at these scenarios. Why do you have to do a disaster recovery install? Why not just have an exchange server running in the DR site, then restore mailboxes using the storage recovery group?
 
1. You'll have profile redirection issues.
2. Offline stores will break, requiring a full resync (may not want to do that over WAN to your DR site).
3. You have to create a storage group with the same name as the SG to be recovered on the recovery server. This effectively limits the number of SGs possible on that server.
4. If you're not at SP1, then you have to deal with the Exmerge limits.


The /disasterrecovery route avoids these problems at the cost of the time it takes to install exchange. In many cases, it's the shortest path.

In a small organization, you may be able to live with that. In the event of a DR scenario, someone will run around repointing profiles. With a lower number of users, the impact of resyncing offline folders will be less.


One note; on a cluster, you can't use the /disaster recovery switch. However, the recovery path is actually easier. You simply reset the computer account for one of the nodes, and join a new box to the domain with the old node name. Do a normal install cluster, connecting to the same drive letters. After installing exchange on the cluster node, when you go to create the SA, specifiy the existing virtual server name. Exchange will pull the configuration information for the virtual server from AD.


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top