Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Error After Upgrading SAN Firmware 3

Status
Not open for further replies.

khalidaaa

Technical User
Jan 19, 2006
2,323
BH
Hi Folks,

After upgrading the SAN Controller Firmware + NVRAM + Drivers. I Started to get these errors when i booted the machine. Any idea what is this?

Code:
#  errpt -a -l 190
---------------------------------------------------------------------------
LABEL:          FCP_ARRAY_ERR6
IDENTIFIER:     B9735AF4

Date/Time:       Tue Jul  4 17:31:35 SAUS
Sequence Number: 190
Machine Id:      00C5C1EB4C00
Node Id:         s2edms
Class:           H
Type:            PERM
Resource Name:   hdisk2          
Resource Class:  disk
Resource Type:   array
Location:        U7879.001.DQDNZFK-P1-C6-T1-W200300A0B817BAC1-L0

Description
SUBSYSTEM COMPONENT FAILURE

Probable Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

Failure Causes
ARRAY DASD MEDIA
POWER OR FAN COMPONENT

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0600 0308 0000 FF00 0000 0004 0000 0000 0000 0000 0000 0000 0000 0000 7000 0600 
0000 0098 0000 0000 A100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 8000 
0008 3000 0000 0000 0000 0000 0000 0000 0000 3154 3530 3232 3336 3732 2020 2020 
2020 0612 2700 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
0001 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
0000 0000 0001 6022 3037 3034 3036 2F30 3930 3833 3900 0000 0000 0000 0000 0000 
0000 0000 32F1 0000 F605 2606 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 
0000 0000

regards,
Khalid
 
Thanks for your time.

Hmmmm, would that procedure not ending removing hdisk0 from my system (if I remove the parent bus)? hdisk0 is the only disk in rootvg at the mo.

Before removing the devices yesterday, hdisk1 did not show up with lspv but all the SAN disks started from hdisk2-9 so (I think) at least the odm retained a reference to it. However, after I removed the devices and cfgmgr-ed, I think the only reason that hdisk1 has now been assigned to a SAN disk is that there was no reference in the odm for hdisk1. I wouldn`t mind if the second disk shows up as hdisk9.

Are you confident that your procedure will not remove hdisk0 from the box?

Cheers
 
When i explained that with hdisk0, i didn't mean you do that on hdisk0! sorry for giving this impression. i just did this on my machine with hdisk0. You need to do that on hdisk1. Unless hdisk0 and hdisk1 are having the same scsi! ok, i'll tell you something. I have two hdisks (0 & 1) having the same scsi. I will try the above and let you know the result :)

Regards,
Khalid
 
I am sorry for the confusion. I understood what you were saying but didn`t explain my reservation clearly enough. In my case, hdisk1 is now a san disk. I have gone through the steps as below:

root@onaira06:/$ lsdev -C -F parent -l hdisk1
dar0
root@onaira06:/$ lsdev -C -F parent -l dar0

root@onaira06:/$ lsdev -C -F parent -l hdisk0
scsi1
root@onaira06:/$ lsdev -C -F parent -l scsi1
sisscsia0
root@onaira06:/$ lsdev -C -F parent -l sisscsia0
pci6

However, I am not entirely sure that if I do rmdev -R -l pci6 this will not remove hdisk0, which will cause my system to crash completely...

I look forward to seeing the result of your test. Thanks a lot
 
Ok i got confused now :)

Now you are having the problem on hdisk1 right? which is a SAN disk.

so you don't have to do any thing with hdisk0. Of course, if you do the above with hdisk0, the system will hang. Or i don't think it will allow you to do it (worth trying :))

ok, let's start over, how many internal disks do you have? how many SAN disk do you have? What's the output of lspv now?

Regards,
Khalid
 
OK, let me start again.

Before my problems started I had two disks (hdisk0 and hdisk1) as part of rootvg. At the time the lpar could not see the SAN so there were only these two disks.

However, working on the SAN problem threw a device into an inconsistent state which affected all the other devices and removed hdisk1 from rootvg. As at yeterday morning, lspv was showing:

hdisk0 00c543efe8ada10a rootvg active
hdisk2 none None
hdisk3 none None
hdisk4 none None
hdisk5 none None
hdisk6 none None
hdisk7 none None
hdisk8 none None
hdisk9 none None

Note that hdisk1 is not listed.

After I removed all the devices and re-run cfgmgr I now get:
root@newlpar:/$ lsdev -Cc disk
hdisk0 Available 06-08-01-8,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 05-08-02 fcparray Disk Array Device
hdisk2 Available 05-08-02 fcparray Disk Array Device
hdisk3 Available 05-08-02 fcparray Disk Array Device
hdisk4 Available 05-08-02 fcparray Disk Array Device
hdisk5 Available 05-08-02 fcparray Disk Array Device
hdisk6 Available 05-08-02 fcparray Disk Array Device
hdisk7 Available 05-08-02 fcparray Disk Array Device
hdisk8 Available 05-08-02 fcparray Disk Array Device

Note that hdisk1 is now assigned to a SAN disk and there is only one internal disk showing. It appears that aix is not seeing it and therefore cannot configure it to bring it online. So what I am trying to do is get the OS to see the second internal disk and configure it so that I can add it to rootvg...

I hope that`s a little clearer now.

Cheers
 
Ok, now it is clear.

Unfortunatly i'm busy for now but i will look into that later

Sorry

mean while, what's lsdev -C | grep disk shows?

Regards,
Khalid
 
What's the output of this?

odmget -q name="hdisk1" CuDv

odmget -q name="hdisk0" CuDv

Regards,
Khalid
 
Hi

lsdev -C | grep disk is exactly as lsdev -Cc disk output above. odmget returns the following:

root@newlpar:/$ odmget -q name="hdisk1" CuDv

CuDv:
name = "hdisk1"
status = 1
chgstatus = 2
ddins = "fcparray"
location = "05-08-02"
parent = "dar0"
connwhere = "1"
PdDvLn = "disk/fdar/array"
root@newlpar:/$ odmget -q name="hdisk0" CuDv

CuDv:
name = "hdisk0"
status = 1
chgstatus = 2
ddins = "scdisk"
location = "06-08-01-8,0"
parent = "scsi1"
connwhere = "8,0"
PdDvLn = "disk/scsi/scsd"

Rgs
 
hmmm I have done the following on one of our machines that has a mirror of hdisk0 and hdisk1 on the rootvg and i got this:

Code:
# odmget -q name="hdisk0" CuDv

CuDv:
        name = "hdisk0"
        status = 1
        chgstatus = 2
        ddins = "scdisk"
        location = "03-08-00-4,0"
        parent = "scsi0"
        connwhere = "4,0"
        PdDvLn = "disk/scsi/scsd"

# odmget -q name="hdisk1" CuDv

CuDv:
        name = "hdisk1"
        status = 1
        chgstatus = 2
        ddins = "scdisk"
        location = "03-08-00-5,0"
        parent = "scsi0"
        connwhere = "5,0"
        PdDvLn = "disk/scsi/scsd"

As you can see both parents are scsi0, yours is scsi1

so try this

cfgmgr -l scsi1

will it show as hdisk9 or something?

if not then is there any thing in the errpt -a?

Regards,
Khalid
 
if cfgmgr -l scsi1 didn't work then try cfgmgr -s

Regards,
Khalid
 
Tried both and none worked. I then rmdev -dl scsi0 and the other devices and rebooted but nothing has changed. This is the errpt entry form yesterday:

-----------------------------------------------------------
LABEL: FCP_ARRAY_ERR10
IDENTIFIER: C86ACB7E
Date/Time: Mon Jul 31 16:49:33 CDT 2006
Sequence Number: 986
Machine Id: 00C543EF4C00
Node Id: newlpar
Class: H
Type: INFO
Resource Name: hdisk1
Resource Class: disk
Resource Type: array
Location: U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L0

Description
ARRAY CONFIGURATION CHANGED

Probable Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS
Failure Causes
ARRAY CONTROLLER
CABLES AND CONNECTIONS

Recommended Actions
NO ACTION NECESSARY

hmmmm
 
OMG :(

This is an LPAR right? could you list the properties of that LPAR? I mean I/O Slots

 
Not sure how to list all the i/o slots from the cli but from the MHC gui, there are two storage controllers, two ethernet pci cards and two fc cards assigned to the lpar. All these resources are added "as required".

root@newlpar:/$ lsdev -Cs scsi

hdisk0 Available 06-08-01-8,0 16 Bit LVD SCSI Disk Drive
ses0 Available 06-08-01-15,0 SCSI Enclosure Services Device
ses1 Defined 09-08-01-15,0 SCSI Enclosure Services Device

root@newlpar:/$ lsslot -c pci

# Slot Description Device(s)
U7311.D20.6502EBB-CB1-C01 PCI-X capable, 64 bit, 133MHz slot ent2 ent3
U7311.D20.6502EBB-CB1-C02 PCI-X capable, 64 bit, 133MHz slot Unknown
U7311.D20.6502EBB-CB1-C06 PCI-X capable, 64 bit, 133MHz slot ent0 ent1
U7311.D20.6502EBB-CB1-C07 PCI-X capable, 64 bit, 133MHz slot Unknown
U7311.D20.6502EBB-CB1-C08 PCI-X capable, 64 bit, 133MHz slot sisscsia0

or is there a neater command to gather all the info?

cheers
 
ok

What are these slots that are showing unknown? Never mind.

ok. I got U7311.D20 here at work also. But as you can see from hdisk0 and hdisk1 that i have, these two disks are on the same controller scsi0. R you having the same case?

coz by mentioning that you have two storage controller means that hdisk1 is not connected to hdisk0 controller right?

Cheers
 
The unknown slots were due to the devices that I removed. Having reconfigured, they are now listed.

root@onaira06:/$ lsslot -c pci
# Slot Description Device(s)
U7311.D20.6502EBB-CB1-C01 PCI-X capable, 64 bit, 133MHz slot ent2 ent3
U7311.D20.6502EBB-CB1-C02 PCI-X capable, 64 bit, 133MHz slot fcs1
U7311.D20.6502EBB-CB1-C06 PCI-X capable, 64 bit, 133MHz slot ent0 ent1
U7311.D20.6502EBB-CB1-C07 PCI-X capable, 64 bit, 133MHz slot fcs0
U7311.D20.6502EBB-CB1-C08 PCI-X capable, 64 bit, 133MHz slot sisscsia0

I went to the server room earlier to physically look at the D20 and there are only two disks in it- one is slot 1 and the second in slot 7. Both disks have green lights on. Apart from checking the odm (with odmget as you did earlier) I don`t know how to check whether they are on the same controller or not except that in the HMC gui they are two different controllers in different slots. In my case, the other internal disk is not showing up in AIX so I don`t know how to check if they are connected to the same controller or not....

Hmmmm. Thank you very much for persevering with this issue
 
Its ok my friend :) one day i will be down and you will help me :) right?

any way, from your description it seems that they are on different controllers. ok, why don't you try to assign this slot to a different LPAR and see what happens after cfgmgring it!

in this way, you will make sure that the disk is working at least

Cheers
 
Good day Khalid

You are back! Buddy, I owe you big time.

I reasssigned the disk controller to another lpar and ran cfgmgr but it didn`t pick it up.

This morning there was a message that:

"root@onaira06:/$
Wed Aug 2 00:26:02 CDT 2006

Automatic Error Log Analysis for sysplanar0 has detected a problem. The Service Request Number is B123E500: Memory subsystem including external cache Predictive Error, general. Refer to the system service documentation for more information.."

Again, diag says "no problem was found."

This box was built as a spare but it wasn`t being utilised so there is nothing installed. I am now minded to re-install the OS and see what happens.

This is rootvg:
root@onaira06:/$ lsvg -p rootvg r
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 546 270 109..48..00..04..109
0516-304 : Unable to find device id 00c543eff7f77472 in the Device
Configuration Database.
00c543eff7f77472 missing 546 270 109..48..00..04..109

I wonder if it is possible to recreate the disk using the device id listed here....

Anyway, the output of prtconf is (my apologies for the large dump):

System Model: IBM,9117-570
Machine Serial Number: 65543EF
Processor Type: PowerPC_POWER5
Number Of Processors: 1
Processor Clock Speed: 1656 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 6 newlpar_Standby_Failover_LPAR
Memory Size: 1536MB 1
Good Memory Size: 1536 MB
Platform Firmware level: Not Available
Firmware Version: IBM,SF230_126
Console Login: enable
Auto Restart: true
Full Core: false
Network Information n
Host Name: newlpar n
Standard input0516-304 : Unable to find device id 00c543eff7f77472 in the Device Configuration Database.

Paging Space Information
Total Paging Space: 512MB Percent Used: 1% 1%
Volume Groups Information
==============================================================================
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 546 270 109..48..00..04..109
00c543eff7f77472 missing 546 270 109..48..00..04..109
==============================================================================

INSTALLED RESOURCE LIST
The following resources are installed on the machine.
+/- = Added or deleted from Resource List.
* = Diagnostic support not available.

Model Architecture: chrp
Model Implementation: Multiple Processor, PCI bus

Standard input
System Object
+ sysplanar0
System Planar
* vio0
Virtual I/O Bus
* vsa0 U9117.570.65543EF-V6-C0
LPAR Virtual Serial Adapter
* vty0 U9117.570.65543EF-V6-C0-L0
Asynchronous Terminal
* pci2 U7311.D20.6502EBB-CB1
PCI Bus
* pci7 U7311.D20.6502EBB-CB1
PCI Bus
+ ent2 U7311.D20.6502EBB-CB1-C01-T1
2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
+ ent3 U7311.D20.6502EBB-CB1-C01-T2
2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
* pci8 U7311.D20.6502EBB-CB1
PCI Bus
+ fcs1 U7311.D20.6502EBB-CB1-C02-T1
FC Adapter
* fcnet1 U7311.D20.6502EBB-CB1-C02-T1
Standard input
* fcnet1 U7311.D20.6502EBB-CB1-C02-T1
* fscsi1 U7311.D20.6502EBB-CB1-C02-T1
FC SCSI I/O Controller Protocol Device
* pci1 U7311.D20.6502EBB-CB1
PCI Bus
* pci4 U7311.D20.6502EBB-CB1
PCI Bus
+ ent0 U7311.D20.6502EBB-CB1-C06-T1
2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
+ ent1 U7311.D20.6502EBB-CB1-C06-T2
2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902) pci5 U7311.D20.6502EBB-CB1 PCI Bus
+ fcs0 U7311.D20.6502EBB-CB1-C07 T1 FC Adapter * fcnet0 U7311.D20.6502EBB-CB1-C07-T1 Fibre Channel Network Protocol Device
* fscsi0 U7311.D20.6502EBB-CB1-C07-T1 FC SCSI I/O Controller Protocol Device
* dac0 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424
fcparray Disk Array Controller
* pci6 U7311.D20.6502EBB-CB1 PCI Bus
Standard input t
PCI Bus
PCI-X Dual Channel Ultra320 SCSI Adapter
+ scsi0 U7311.D20.6502EBB-CB1-C08-T1
PCI-X Dual Channel Ultra320 SCSI Adapter bus
+ scsi1 U7311.D20.6502EBB-CB1-C08-T2
PCI-X Dual Channel Ultra320 SCSI Adapter bus
+ hdisk0 U7311.D20.6502EBB-CB1-C08-T2-L8-L0
16 Bit LVD SCSI Disk Drive (73400 MB)
* ses0 U7311.D20.6502EBB-CB1-C08-T2-L15-L0
SCSI Enclosure Services Device
* pci0 U7879.001.DQD3C3N-P1
PCI Bus
* pci3 U7879.001.DQD3C3N-P1
PCI Bus
* ide0 U7879.001.DQD3C3N-P1-T15
ATA/IDE Controller Device
+ cd0 U7879.001.DQD3C3N-P4-D1
IDE DVD-ROM Drive
+ L2cache0
L2 Cache
+ mem0
Memory
+ proc0
ATA/IDE Controller Device
+ cd0 U7879.001.DQD3C3N-P4-D1
IDE DVD-ROM Drive
+ L2cache0
L2 Cache
+ mem0
Memory
+ proc0
+ hdisk1 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L0
fcparray Disk Array Device
+ hdisk2 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L1000000000000
fcparray Disk Array Device
+ hdisk3 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L2000000000000 fcparray Disk Array Device
+ hdisk4 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L3000000000000 fcparray Disk Array Device
+ hdisk5 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L4000000000000
fcparray Disk Array Device
+ hdisk6 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L5000000000000 fcparray Disk Array Device
+ hdisk7 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L6000000000000
fcparray Disk Array Device
+ hdisk8 U7311.D20.6502EBB-CB1-C07-T1-W201400A0B8113424-L7000000000000
Standard input: END
 
hmm I think i know the problem!

I just need to read the manuals for problem determination coz i guess this case is well explained there. (missing pv)

meanwhile, can you show me the output of:

lqueryvg -p hdisk0 -At
lqueryvg -p hdisk1 -At

I think the disk is till shown in the VGDA but not the ODM!

I will come back to you soon on this

Cheers
 
Cool
I removed the "device id 00c543eff7f77472" so this is not showing up anymore except, as you have identified, lqueryvg -p hdisk0 -At, there are two VGDAs on hdisk0 as below.
lqueryvg -p hdisk0 -At shows:

lqueryvg -p hdisk0 -At
Max LVs: 256
PP Size: 27
Free PPs: 238
LV count: 17
PV count: 1
Total VGDAs: 2
Conc Allowed: 0
MAX PPs per PV 1016
MAX PVs: 32
Conc Autovaryo 0
Varied on Conc 0
Logical: 00c543ef00004c000000010aad9767e5.1 hd5 1
00c543ef00004c000000010aad9767e5.2 hd6 1
00c543ef00004c000000010aad9767e5.3 hd8 1
00c543ef00004c000000010aad9767e5.4 hd4 1
00c543ef00004c000000010aad9767e5.5 hd2 1
00c543ef00004c000000010aad9767e5.6 hd9var 1
00c543ef00004c000000010aad9767e5.7 hd3 1
00c543ef00004c000000010aad9767e5.8 hd1 1
00c543ef00004c000000010aad9767e5.9 hd10opt 1
00c543ef00004c000000010aad9767e5.10 fslv00 1
00c543ef00004c000000010aad9767e5.11 loglv00 1
00c543ef00004c000000010aad9767e5.12 tftpbootlv 1
00c543ef00004c000000010aad9767e5.13 exportlv 1
00c543ef00004c000000010aad9767e5.14 spotlv 1
00c543ef00004c000000010aad9767e5.15 lpp_sourcelv 1
00c543ef00004c000000010aad9767e5.16 mksysblv 1
00c543ef00004c000000010aad9767e5.17 paging00 1
Physical: 00c543efe8ada10a 2 0
Total PPs: 546
LTG size: 128
HOT SPARE: 0
AUTO SYNC: 0
VG PERMISSION: 0
SNAPSHOT VG: 0
IS_PRIMARY VG: 0
PSNFSTPP: 4352
VARYON MODE: 0
VG Type: 0
Max PPs: 32512

root@newlpar:/$ lqueryvg -p hdisk1 -At
0516-304 lqueryvg: Unable to find device id hdisk1 in the Device Configuration Database.

0516-062 lqueryvg: Unable to read or write logical volume manager record. PV may be permanently corrupted. Run diagnostics

Thanks a lot
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top