Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cluster.log errors - ServeRAID

Status
Not open for further replies.
Mar 18, 2004
7
CA
Hi,

I have built a cluster using:

2 x IBM X345
2 x ServeRAID 6M cards

I followed IBM and Microsoft instructions for this implementation. I am now testing and have run into a problem.

When I Move Group from Node B to Node A, the failover takes approximately 15-20 seconds. When I Move Group from Node A to Node B, the failover takes 6 minutes. No application or system log entries appear to explain this problem. I have found errors in my cluster.log that correspond to the 6 minute gap but I have been unable to locate any documentation describing the errors or providing information about how to correct them.

Here are the two errors that I receive repeatedly over the span of 6 minutes:

000005bc.00000c54::2006/05/01-17:27:08.424 ERR ServeRAID Logical Disk: IPSHAReadAdapterConfiguration FAILED by Windows NT. LastError = 1117.
000005bc.00000c54::2006/05/01-17:27:08.424 ERR ServeRAID Logical Disk: IPSHAReadUserPage4 FAILED by Windows NT. LastError = 1117.

Would someone be able to provide some insight into these errors and perhaps point me towards some documentation?

TIA,

Arrow12man
 
Check your ServeRAID driver versions. IPSHA is an IMB component, so you''ll likely find more information on the IBM web site.
 
This is an issue I experiened in the past with IBM hardware and clustering. Because I have not used them in a long time (3 years) this may or may not relate to your issue.

Have you setup/tested clustering at the hardware level. Launch ServeRAID manager and verify that both RAID cards are set up for clustering and are communicating with thier cluster partner. There should be a way to test communication via the bus between the two cards.
 
The cluster validation tool runs successfully without error. This issue appears to be a result of IPSHA disks. IBM uses its own dll to managed the disks in the cluster at the OS level. Failover is working both ways, the problem is that in one direction it takes 6 minutes, the other takes 20 seconds.

Interesting note: I wiped everything out and installed clustering (from the ground up) opposite from the IBM and Microsoft instructions (ie. previously Node1 = ServerA and Node2=ServerB; this time Node1=ServerB and Node2=ServerA). After doing this, the 6 minute pause occurred in the opposite direction (ie. previously a failover from Node1 to Node2 took six minutes; this time a failover from Node2 to Node1 took six minutes).

Microsoft says that the pause is related to a 180 second timeout (I have 2 logical drives failing over) and recommends that I use a physical disk resource (instead of IPSHA) or switch to fibre "as soon as possible". Both recommendations mean that I would have to walk away from my current solution and acquire additional hardware. No thanks - the current hardware solution is on the MS HCL.

Cheers,

Arrow12man
 
I thought that might happen... IBM told us they were no longer supporting clustering via SCSI and SereRAID.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top