Error After Upgrading SAN Firmware 3

khalidaaa · Jul 4, 2006

Hi Folks,

After upgrading the SAN Controller Firmware + NVRAM + Drivers. I Started to get these errors when i booted the machine. Any idea what is this?

Code:

#  errpt -a -l 190
---------------------------------------------------------------------------
LABEL:          FCP_ARRAY_ERR6
IDENTIFIER:     B9735AF4

Date/Time:       Tue Jul  4 17:31:35 SAUS
Sequence Number: 190
Machine Id:      00C5C1EB4C00
Node Id:         s2edms
Class:           H
Type:            PERM
Resource Name:   hdisk2          
Resource Class:  disk
Resource Type:   array
Location:        U7879.001.DQDNZFK-P1-C6-T1-W200300A0B817BAC1-L0

Description
SUBSYSTEM COMPONENT FAILURE

Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600 
0000 0098 0000 0000 A100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 8000 
0008 3000 0000 0000 0000 0000 0000 0000 0000 3154 3530 3232 3336 3732 2020 2020 
2020 0612 2700 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
0000 0000 0001 6022 3037 3034 3036 2F30 3930 3833 3900 0000 0000 0000 0000 0000 
0000 0000 32F1 0000 F605 2606 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
0000 0000

regards,
Khalid

DukeSSD · Jul 4, 2006

From: IBM TotalStorage FAStT Storage Manager Version 8.4x Installation and Support Guide for AIX, HP-UX, and Solaris

FCP_ARRAY_ERR6 SUBSYSTEM COMPONENT FAILURE A degradation condition has occurred other than a disk drive.

What did you upgrade too, and from?
What switches?

Aix thinks some thing is different. You may need to rmdev and recreate some or all of the fibre attached devices.

You may need to check your adapter device drivers and firmware for compatability with the new firmware at both the aix and the disk end of the SAN.

From the error description it looks like your disks are OK, you just have some sort of enclosure or communication issue.

khalidaaa · Jul 4, 2006

Thanks DukeSSD.

well i can't do any thing right now on the disks or the fibers because the system is on production and users are accessing it with no problems

as you said it might be a communication issue

but when you say check the adapter devices drivers! what do u mean by that?

Regards
Khalid

Breslau · Jul 5, 2006

1. if you are using a switch in between the disk and host, see if the switch is logging any port errors for the particular ports they are using.

2. access your disk controller with whatever management software you have and see if it think it's healthy or has an issue with any disk/device.

3. if your host is using a non-IBM HBA, see if it has utils available to check its status and devices it can see.

4. is hdisk2 in a vg and varried on, and in use?

khalidaaa · Jul 5, 2006

I'm using an IBM 4500 SAN. The status showing optimal from the Storage manager client.

hdisk2 is in use right now and it doesn't show any problems on the errpt since i did the firmware upgrade but i got paranoid when i got this message when i rebooted the server

Code:

-----------------------------------------------------------------------------
        A degradation condition has been detected on device  hdisk2
        Either the disk has become degraded or the disk 
        controller has detected an error.
        To examine the error entry, execute the following command:
                errpt -a -l 528
        If problem persists run diagnostics on the device.
-----------------------------------------------------------------------------


-----------------------------------------------------------------------------
        A degradation condition has been detected on device  hdisk1
        Either the disk has become degraded or the disk 
        controller has detected an error.
        To examine the error entry, execute the following command:
                errpt -a -l 526
        If problem persists run diagnostics on the device.
-----------------------------------------------------------------------------

how can i make sure this message is not actually degrading the preformance somehow?

its a good idea to check the switch!

Thanks Berslau

Breslau · Jul 5, 2006

did you run diags on it as suggested?

it's possible you just had a momentary glitch that was enough to show up in your errpt. if diags comes back clean and the controller is not showing a failed drive or some other compenent then i wouldn't worry about it.

note what happened and monitor the system for the same error in the future.

call IBM support if you want to be really sure about it.

khalidaaa · Jul 7, 2006

Thanks Breslau,

I haven't run the diag yet but as far as errpt is concern i'm not getting any errors over there! and the status of the SAN controller is optimal.

and i don't think its a glitch coz i got this message in almost all the LPARs when i rebooted them after the controller upgrade!

i will run diag later and see what happen

Thanks again

Regards,
Khalid

plamb · Jul 8, 2006

check for adapter code you might need to update the code since you changed controller code.

khalidaaa · Jul 8, 2006

I'm sorry plamb but i just got confused! i didn't change the controller itself!! its just the firmware of the controller

could you please lead me by an example of what you mean? Thanks

Regards,
Khalid

plamb · Jul 9, 2006

the controller "talks" buy using a HBA (fibre card) correct? In the past I have upgraded controller micrcode but the microcode located on the Fibre card was not compatable with the news controller code examples timing, wait staes, buffer sizes etc.

khalidaaa · Jul 10, 2006

How can i find out this information?

The microcode located on the Fibre card is not compatable with the new controller code examples timing, wait staes, buffer sizes etc.

Is there a command to show the timing, wait state, buffer size?

plamb · Jul 10, 2006

if it is a IBM supported Fibre card goto he IBM microcode site
(

http://www14.software.ibm.com/webapp/set2/firmware/gjsn)

and it will tell how to check the version and how to update it

khalidaaa · Jul 11, 2006

Thank you very very much plamb

What a nice piece of information

I read all of that and I tried to implement this on one of the LPARs that had this problem. But strange enough when i restarted this LPAR to see the error again, It didn't show me the errot message!!!! Do you think the system is back to normal for now? Is it stable? or I still need to apply the microcode Upgrade for the Fcs?

Regards,
Khalid

khalidaaa · Jul 11, 2006

Guys, i decided to use the Diagnostics!

It is saying that "Trouble was found. However, the resource was not tested because the device driver indicated that the resource was in use"

The problem is that this LPAR is a SAN boot device so the resource will be in use if it try to do the Diag!!??!!

Any other suggestoins?

Regards
Khalid

cspilman · Jul 11, 2006

Use the diag CD?

Regards,
Chuck

khalidaaa · Jul 11, 2006

Good idea cspilman

I just found out that i do have one more LPAR with no SAN boot. So i will use this first and see.

Thanks

Regards,
Khalid

J1gh2 · Jul 31, 2006

Guys,

I have been following this discussion with keen interest because I also started getting this error (FCP_ARRAY_ERR6) after updating the effective configuration on the SAN (IBM 2109 B32) switch and I not sure how to get rid of it. Essentially, port zoning is in place but one particular port was not included in the effective zone so the partition was not seeing the disk. I created a new zone with the missing port in it enabled the config. The lpar can now see the SAN disk but I am getting all sorts of errors (mostly FCP_ARRAY_ERR6).

I have also found that hdisk1 dropped off from rootvg is defined but not available. When I attempt to configure it so that it is available, I get the message:

I also get the following with fcs0:
TESTING ADVANCED MODE
fcs0 U7311.D20.6502EBB-CB1-C07-T1

The following test requires a fibre connector
wrap plug with one of the following Part Numbers:
11P3847.

Do you have this wrap plug?

I do not have the wrap plug so I select no and testing finishes with
Method error (/etc/methods/cfgscdisk):
0514-077 Cannot perform the requested function because none of the
specified paths match those for the specified device.

A degradation condition has been detected on device hdiskX (I get this for all the disks except hdisk0)
Either the disk has become degraded or the disk
controller has detected an error.
To examine the error entry, execute the following command:
errpt -a -l [error code for all the SAN disks]
If problem persists run diagnostics on the device.
-----------------------------------------------------------

Diagnostics find no problems and when I run shutdown –Fr it hangs for about 20 minutes with
SHUTDOWN PROGRAM
Mon Jul 31 09:16:49 CDT 2006

Wait for 'Rebooting...' before stopping.

Then it takes about another 10-15 minutes to reboot.

Any ideas please?

Thanks

khalidaaa · Jul 31, 2006

Oh i'm sorry to hear that this annoying problem is happening to you. Well, all what i did is contacting IBM engineer about that and he said that this might happen because the LPARs (or whatever i didn't get exactly what he meant) was trying to detect the change that happen on the SAN Controller. So he said if this happens again let me know but it didn't happen! so it was only during that time and then it's gone. The uses didn't experience any degradation or so so i just left it as it is.

But i've never faced the errors you have for now! why don't you try call IBM and see what they say on that?

Regards,
Khalid

J1gh2 · Aug 1, 2006

Thanks Khalid

I realised that a device had gone into an incorrect state which was affecting all the others. I wasn`t sure which device so I decided to remove all the devices (dac0, dar0, fcs0, fcnet0 etc etc) and the child processes and ran cfgmgr. (I had done this before several times but I must have missed the misbehaving device). This time they came back and seem to be in a consistent state and the box only took about 5 minutes to reboot afterwards.

The only problem now is that I have not been able to get the other internal disk to show up. hdisk1 is now a "fcparray Disk Array Device". The scsi slot appear to be enabled but the disk has not shown up so I am looking into how to get the other internal disk to be configured so that I can add it to rootvg.

Best regards

Jisco

khalidaaa · Aug 1, 2006

R you sure that this disk is assigned to this LPAR?

What lspv shows?

Why don't you try to do this, find the parent pci and rmdev -R -l pci* then cfgmgr -s

This is the way i did it to my host

# lsdev -C -F parent -l hdisk0
scsi0

# lsdev -C -F parent -l scsi0
sisscsia0

# lsdev -C -F parent -l sisscsia0
pci3

# rmdev -R -l pci3

# cfgmgr -s

and see whether this will show the hdisk1 as available!

Regards,
Khalid

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Error After Upgrading SAN Firmware 3

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

Technical User

MIS

Technical User

MIS

Technical User

MIS

Technical User

Technical User

MIS

Technical User

MIS

Technical User

MIS

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor