INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

IBM xSeries Bad Stripe

IBM xSeries Bad Stripe

(OP)
Hi,

I have an IBM xSeries 226 server, it looks to have a bad stripe on a disk. It's configured for RAID 5, using 3 disks. The system is hanging usually over night. A system reboot will bring the box back online but I am having to do this more frequently.

The system is running Serveraid Manager 7.10.18, there are no errors in the log of ServeRaid Manager. I am getting an event id 215 twice a day, which you can see in the attachment.

Could anyone shine a light on a possible solution?

No2broady

RE: IBM xSeries Bad Stripe

(OP)
Sorry, I should have said I cannot identify the faulty disk.

No2broady

RE: IBM xSeries Bad Stripe

  Well you are downlevel on the code if you are using 7.10.18, the latest is 7.12.14 http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-60624 Also look here about bad stripes.
http://ask.adaptec.com/scripts/adaptec_tic.cfg/php.exe/enduser/std_adp.php?p_faqid=14947 But this boils down to "No, there is no procedure or tool available for clearing or repairing a bad stripe while maintaining the existing array. In the instances when a bad stripe has occurred, the data contained within that stripe is incomplete, invalid, or inconsistent between the data and parity and a Bad Stripe Table entry is created to block that stripe to prevent hidden data corruption."

SO, any time a bad stripe is encountered, the array must be deleted, and restored from a backup. Also, if you have IBM maintenance you can create a ticket, and send them the Serveraid log, and they can parse it for any errors, and that can point to a failing drive, but it just may be related to downlevel code and drivers.  

RE: IBM xSeries Bad Stripe

Man you just beat me to the punch.  This could be caused by A) never running an array synchronization, B) faulty disk drive C) out of date driver or out of date controller firmware.

What IBM says:
http://publib.boulder.ibm.com/infocenter/eserver/v1r2/index.jsp?topic=%2Fdiricinfo%2Ffqy0_tbs_86.html

I would still use the bootable CD (http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-60624#DOCPRODUCTS) to check the status of all the drives and maybe even test each one to see if anything needs to be replaced.  You want to make sure all the disks are okay before you nuke/reload.

Plus you should check and update the firmware on the controller as well as the driver for the operating system.  Probably do the firmware after you wipe out the array - nothing to worry about at that point.  Have the latest driver ready for O.S. installation.

Wow - I never had one of those happen to me.  Thank goodness.

RE: IBM xSeries Bad Stripe

(OP)
Thanks for your reply.

I read the part about deleting the array, the problem I have is I don't have a spare disk and also a way of telling which disk is faulty to be able to swap bad for good. I don't have IBM maintenance on that system so that option isn't a go unfortunately.

If I do delete the array and rebuild from a back up, will the bad stripe be ignored?

No2broady  

RE: IBM xSeries Bad Stripe

To be clear, deleting the array and recreating will fix everything, provided you don't have a bad disk.

Follow my last post to boot to the serveraid CD and then check each disk.  No sense in trying to fix the array issue with an o.s. reload if bad hard drives are still present.

You should always have a spare drive in stock <<< optimally

RE: IBM xSeries Bad Stripe

(OP)
Thanks for the info, I am currently burning a copy of the serveraid CD. You'll have to forgive my ignorance, not run this before. So I can basically boot into that CD and run a diagnostic routine from there? Does this update the firmware in the process?

RE: IBM xSeries Bad Stripe

(OP)
Another thing was I did run CHKDSK and the results are below, I haven't run this in fix mode, the process as you can see has found errors but haven't run these yet today as the system is still in use. Could this do anything to work around the bad stripr or am I being wishful!?!


C:\>chkdsk
The type of the file system is NTFS.

WARNING!  F parameter not specified.
Running CHKDSK in read-only mode.

CHKDSK is verifying files (stage 1 of 3)...
259584 file records processed.
File verification completed.
2998 large file records processed.
0 bad file records processed.
0 EA records processed.
4 reparse records processed.
CHKDSK is verifying indexes (stage 2 of 3)...
895133 index entries processed.
Index verification completed.
5 unindexed files processed.
CHKDSK is verifying security descriptors (stage 3 of 3)...
259584 security descriptors processed.
Security descriptor verification completed.
21287 data files processed.
CHKDSK is verifying Usn Journal...
537530128 USN bytes processed.
Usn Journal verification completed.
Windows found problems with the file system.
Run CHKDSK with the /F (fix) option to correct these.

 286746607 KB total disk space.
 131644778 KB in 233928 files.
     92288 KB in 21288 indexes.
         0 KB in bad sectors.
    922678 KB in use by the system.
     65536 KB occupied by the log file.
 154086862 KB available on disk.

       512 bytes in each allocation unit.
 573493215 total allocation units on disk.
 308173725 allocation units available on disk.

C:\>

RE: IBM xSeries Bad Stripe

It is a bootable CD.  It should be able to upgrade the controller BIOS and firmware.   What controller do you have??
http://delivery04.dhe.ibm.com/sar/CMA/XSA/ibm_sw_srsupp_7.12.14_anyos_32-64.txt


You'll have to look at the documentation, but you should be able to see the status of logical drives, arrays and each individual disks (online, offline, etc.) and be able to test each drive.
http://publib.boulder.ibm.com/infocenter/eserver/v1r2/index.jsp?topic=%2Fdiricinfo%2Ffqy0_cstartcd.html


Updating individual hard drive firmware (this is just an example of what you need - it may not be the latest for your server.  I leave that to you.  But it's the type of CD you need.
http://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=MIGR-60843&amp;brandind=5000008

 

RE: IBM xSeries Bad Stripe

(OP)
Hi goombawaho,

I have a Serveraid-6i controller. I am just going through a step by step for how I'm going to attack this.

Firstly going by the IBM guide I have to update the device driver via the OS, then boot into the Serveraid firmware/diagnostics CD and follow the wizard prompts.

I have one question, do I still have to update the hard drive firmware too?

I am trying to get to a point where I can get the system to tell me exactly which disk needs replacing and obviously not follow the nuke/reload road!

Thanks again for your help.

RE: IBM xSeries Bad Stripe

I guess the hard drive firwmare update is optional at this point.  I would just tell you to do it AFTER you have nuked the array and BEFORE you recreate it.  At that point you have nothing to lose.

And YES, I am a believer in checking for HDD firmware updates.  But you can read the TXT file that is included that will tell you what is fixed.  It might be critical, it might be minor and it might not even affect any of the drives you have.

How it works is that these update CDs can update multiple hard drives.  IBM sources multiple vendors for hard drives.  You may have an IBM drive model XYZ123 (1TB SATA 10,0000 rpm - whatever), but it could be sourced from multiple vendors.  So you might have 2 Seagates and 1 Western Digital in your server.  Your particular hard drives might not need updating and then again they might.

I've seen new servers with 6 hard drives purchased at the same time containing 3 different brands of drives.  You might think they were all exactly the same.  Not.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close