Cluster.log errors - ServeRAID

arrow12man · May 4, 2006

Hi,

I have built a cluster using:

2 x IBM X345
2 x ServeRAID 6M cards

I followed IBM and Microsoft instructions for this implementation. I am now testing and have run into a problem.

When I Move Group from Node B to Node A, the failover takes approximately 15-20 seconds. When I Move Group from Node A to Node B, the failover takes 6 minutes. No application or system log entries appear to explain this problem. I have found errors in my cluster.log that correspond to the 6 minute gap but I have been unable to locate any documentation describing the errors or providing information about how to correct them.

Here are the two errors that I receive repeatedly over the span of 6 minutes:

000005bc.00000c54::2006/05/01-17:27:08.424 ERR ServeRAID Logical Disk: IPSHAReadAdapterConfiguration FAILED by Windows NT. LastError = 1117.
000005bc.00000c54::2006/05/01-17:27:08.424 ERR ServeRAID Logical Disk: IPSHAReadUserPage4 FAILED by Windows NT. LastError = 1117.

Would someone be able to provide some insight into these errors and perhaps point me towards some documentation?

TIA,

Arrow12man

xmsre · May 9, 2006

Check your ServeRAID driver versions. IPSHA is an IMB component, so you''ll likely find more information on the IBM web site.

arrow12man · May 10, 2006

Thanks for your reply.

Our versions (on both servers) are:

BIOS: 7.12.02
Driver: 7.10.18
Firmware: 7.12.02

This matches the Microsoft HCL tested solution:

http://www.microsoft.com/windows/ca...tail&pgn=859EC384-E686-B2D2-B8E3-A1F2AB465030

Arrow12man

WhoKilledKenny · May 10, 2006

This is an issue I experiened in the past with IBM hardware and clustering. Because I have not used them in a long time (3 years) this may or may not relate to your issue.

Have you setup/tested clustering at the hardware level. Launch ServeRAID manager and verify that both RAID cards are set up for clustering and are communicating with thier cluster partner. There should be a way to test communication via the bus between the two cards.

arrow12man · May 11, 2006

The cluster validation tool runs successfully without error. This issue appears to be a result of IPSHA disks. IBM uses its own dll to managed the disks in the cluster at the OS level. Failover is working both ways, the problem is that in one direction it takes 6 minutes, the other takes 20 seconds.

Interesting note: I wiped everything out and installed clustering (from the ground up) opposite from the IBM and Microsoft instructions (ie. previously Node1 = ServerA and Node2=ServerB; this time Node1=ServerB and Node2=ServerA). After doing this, the 6 minute pause occurred in the opposite direction (ie. previously a failover from Node1 to Node2 took six minutes; this time a failover from Node2 to Node1 took six minutes).

Microsoft says that the pause is related to a 180 second timeout (I have 2 logical drives failing over) and recommends that I use a physical disk resource (instead of IPSHA) or switch to fibre "as soon as possible". Both recommendations mean that I would have to walk away from my current solution and acquire additional hardware. No thanks - the current hardware solution is on the MS HCL.

Cheers,

Arrow12man

WhoKilledKenny · May 11, 2006

Check with IBM, is your setup on thier supported clustering solutions web page?

arrow12man · May 12, 2006

All of the links on IBM's website pertaining to xSeries and Clustering are dead.

http://www.pc.ibm.com/ww/eserver/xseries/clustering/index.html

Arro12wman

WhoKilledKenny · May 12, 2006

I thought that might happen... IBM told us they were no longer supporting clustering via SCSI and SereRAID.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Cluster.log errors - ServeRAID

arrow12man

MIS

xmsre

ISP

arrow12man

MIS

WhoKilledKenny

MIS

arrow12man

MIS

WhoKilledKenny

MIS

arrow12man

MIS

WhoKilledKenny

MIS

Similar threads

Part and Inventory Search

Sponsor