Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

getting disk errors again

Status
Not open for further replies.

stevenriz

IS-IT--Management
Joined
May 21, 2001
Messages
1,069
Hello, we are getting errors on a disk again, thankfully not a critical disk... nevertheless I am trying to form an opinion on what the actual issue is here....

Here are the errors, and this is only happening when there is a lot of traffic on the disk... As you can see is seems to be the disk, but then I see the fibre channel is offline, then online. so I am looking at bus issues as well. What do you think? This is an e3500 machine and it's been up for 448 days... Maybe a reboot is in order??? Thanks!

Aug 18 11:29:40 application unix: ID[SUNWssa.socal.link.5010] socal0: port 0: Fibre Channel is OFFLINE
Aug 18 11:29:40 application unix: ID[SUNWssa.socal.link.6010] socal0: port 0: Fibre Channel Loop is ONLINE
Aug 18 11:29:47 application unix: WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@0,0/ssd@w22000020379c358b,0 (ssd0):
Aug 18 11:29:47 application unix: Error for Command: read(10) Error Level: Fatal
Aug 18 11:29:47 application unix: Requested Block: 35225344 Error Block: 35225837
Aug 18 11:29:47 application unix: Vendor: SEAGATE
Serial Number: 0031J80486
Aug 18 11:29:47 application unix: Sense Key: Media Error
Aug 18 11:29:47 application unix: ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xe4
 
The uptime is unlikely to have anything to do with it; I would say check the condition of your fibre cables and look for any kinks or bends. Try swapping the cables around perhaps and see if the error moves with the cable; if so... replace it!

Annihilannic.
 
I know this machine hasn't moved since it was installed about 5 years ago.... I will check the cables as you suggest regardless, you never know, maybe something fell behind the system. So you don't think a reboot will clear out any unknown bugs in the software relating to the backplane?
 
Unlikely IMHO, but I wouldn't discount anything. :-)

Annihilannic.
 
what is the model of the drive? I am aware of many fco's pertaining to fibre drives.

This drive is internal to your system and is the boot drive correct?



Thanks

CA
 
they are all seagate 18gb drives and luckily no, it isn't the boot drive.... Model is ST3108203FC
 
I would take a tar copy of the partition affected to save any data. Perhaps run a fsck and then a reboot.
iostat -En (will confirm disk errors) could run before & afterward boot.
 
Where is this drive located? internal or in a A5200?

I can tell that it is connected to the board in i/o slot 1 socal port A on your E3500, atleast that is what gave the offline error. Usually this board is used to run the internal drives by connecting it to your fibre chanel interface board.
Can you get me the part# on the i/o board in slot one?
It is the first seven digits of the serial# on the outside of the board itself, begins with 501.

The errors you posted only contain 1 read error, are there more errors earlier on or after this one?

What is the first error you see?


Thanks

CA


 
Hi these drives are all internal. I will try and find the part # for you. Everything was fine all night and this morning, now I tried to access drive /dev/dsk/c1t6d0s6 which is a newfs and the telnet window hung and finally returned this error... That the fibre channel is offline, then online.... I am not thinking this is a disk issue anymore.....

Aug 19 11:19:16 application unix: sf1: sf1: Target 0x6 Reset Failed. Ret=105
Aug 19 11:19:18 application unix: sf1: sf:Target driver initiated lip
Aug 19 11:19:18 application unix: ID[SUNWssa.socal.link.5010] socal0: port 1: Fibre Channel is OFFLINE
Aug 19 11:19:18 application unix: ID[SUNWssa.socal.link.6010] socal0: port 1: Fibre Channel Loop is ONLINE
Aug 19 11:19:18 application unix: sf1: target 0x6 al_pa 0xdc offlined
Aug 19 11:19:18 application unix: WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w22000020379c45f6,0 (ssd3):
Aug 19 11:19:18 application unix: requeue of command fails (fffffffe)
Aug 19 11:19:18 application unix:
Aug 19 11:19:18 application unix: WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w22000020379c45f6,0 (ssd3):
Aug 19 11:19:18 application unix: requeue of command fails (fffffffe)
Aug 19 11:19:18 application unix:
Aug 19 11:19:18 application unix: WARNING: /sbus@2,0/SUNW,socal@d,10000/sf@1,0/ssd@w22000020379c45f6,0 (ssd3):
Aug 19 11:19:18 application unix: transport rejected (-2)
Aug 19 11:19:18 application unix:
Aug 19 11:19:21 application unix: ID[SUNWssa.socal.link.5010] socal0: port 1: Fibre Channel is OFFLINE
Aug 19 11:19:21 application unix: ID[SUNWssa.socal.link.6010] socal0: port 1: Fibre Channel Loop is ONLINE
 
These errors point to fibre hanging off i/o brd slot 1 port 1 which is the b fibre channel.

So you have received errors of both paths.

Can you give me your format output?

There are many patches for fibre related issues, drive firmware, sf patches etc..

Have you ever updated any of your patches for fibre, if not do a backup and schedule some time to get the fibre related patches up to date.

Thanks

CA
 
No, this system has been 100% static for close to 5 years, therefore we never wanted to muck with it, if you know what I mean. The machine is still on 2.6 with the recommended patches at the time, all this time. All has been superb up until now.... Why would we want to patch something at this point? Unless it is time and/or date related... Do you agree? I still have yet to check the cables....
 
A bug is a bug is a bug, anytime anywhere. You could have a hundred of these same systems running fine for years and one day one catches the bug.

You have 2 different paths/two different cables giving errors, Port A and Port B.
2 different drives giving errors.

If you give me a format output and a copy of
iostat -E | grep Revision I can do some more research.
I have found a firmware patch for your drive already I just want to check at what firmware level you are at.

Thanks

CA


 
I know I know about the bugs...... So cool, doing the same, we can compare. I really appreciate all your help... just don't go too much out of your way!!
steve

# iostat -E | grep Revision
Vendor: TOSHIBA Product: XM6201TASUN32XCD Revision: 1103 Serial No: 12/12/97
Vendor: Symbios Product: StorEDGE A1000 Revision: 0301 Serial No:
Vendor: Symbios Product: StorEDGE A1000 Revision: 0301 Serial No:
Vendor: SEAGATE Product: ST318203FSUN18G Revision: 034A Serial No: 0031J80486
Vendor: SEAGATE Product: ST318203FSUN18G Revision: 034A Serial No: 0031J95276
Vendor: SEAGATE Product: ST318203FSUN18G Revision: 034A Serial No:
Vendor: SEAGATE Product: ST318304FSUN18G Revision: 042D Serial No:
Vendor: SEAGATE Product: ST318203FSUN18G Revision: 114A Serial No:
Vendor: SEAGATE Product: ST39103FCSUN9.0G Revision: 114B Serial No:
Vendor: EXABYTE Product: EXB-8505SMBANSH2 Revision: 0098 Serial No:

#
 
Sorry it took me so long

The drive model st318203fc needs patch 111535-03

there is also a sf/socal patch for solaris 6 which takes care of lip resets patch #105375-29

Thanks

CA
 
well thank you very much, I should definately do these as we are currently at 105375-16... But since I removed the drives and put in a whole other drive, I have not seen these messages so I am leaning towards it is drive related as well.... Thanks!!!!
Steve
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top