Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Wanet Telecoms Ltd on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CPU error ???? 3

Status
Not open for further replies.

thepallace

Technical User
Jun 8, 2004
48
US
I got this AFT error my /var/adm/mesages. Is CPU3 is bad? It detected the error and corrected. Do I need to replaced this ASAP. I did a psrinfo and all CPU's are online. Pls advise.



Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 934965 kern.info] NOTICE: [AFT0] First Error UCC Event detected by CPU3 in Privileged mode at TL=0, errID 0x0024cb30.2d663a80 Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 807159 kern.info] [AFT0] errID 0x0024cb30.2d663a80 Check Bit 7 was in error and corrected Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 156953 kern.info] [AFT2] errID 0x0024cb30.2d663a80 PA=0x00000000.3df7bc80 Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x05000040.c40f20ac 0x031e0f29.8530a005 ECC 0x032 Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x8408a001.80a0a000 0x3248000b.c4072158 ECC 0x1e3 Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0xd05f2028.d25f2168 0x840061b8.65f89dc6 ECC 0x09a Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0xd400a000.c40f20ac 0x8410a020.c42f20ac ECC 0x0df Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available Jul 27 23:08:27 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data not available Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 112859 kern.info] NOTICE: [AFT0] UCC Event detected by CPU3 in Privileged mode at TL=0, errID 0x0024cb30.2d663a80 Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 807159 kern.info] [AFT0] errID 0x0024cb30.2d663a80 Check Bit 7 was in error and corrected Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 156953 kern.info] [AFT2] errID 0x0024cb30.2d663a80 PA=0x00000000.3df7bc80 Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x05000040.c40f20ac 0x031e0f29.8530a005 ECC 0x032 Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x8408a001.80a0a000 0x3248000b.c4072158 ECC 0x1e3 Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0xd05f2028.d25f2168 0x840061b8.65f89dc6 ECC 0x09a Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0xd400a000.c40f20ac 0x8410a020.c42f20ac ECC 0x0df Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available Jul 27 23:08:38 neptune-w1-db11 SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data not available

Thanks,

Jun

 
We had a similar looking error recently. It turned out not to a CPU error, but a memory error. My interpretation of your posting:
NOTICE: [AFT0] First Error UCC Event detected by CPU3 in Privileged mode
is that CPU3 only detected the event and that
Check Bit 7 was in error and corrected
was the event and that it was corrected.

We ran some SUNWvts tests on memory and e-mailed the /var/adm/messages to Sun (as we have a Silver Spectrum h/w contract). They identified which memory board/DIMM was the problem and replaced it a few days later.

I hope that helps you.

Mike
 
Just to pile-on I've seen errors on my servers like this in the past and they are generally memory errors. VTS should show which stick is the problem but you may have to make it run multiple passes.

Generally in the past I've seen it show the memory bank that is at fault but I don't see anything in the error that indicates that. It may be that I'm just unfamiliar with the model of the machine, though. Dig up your hardware docs and see how the memory banks are numbered.

Brian
 
It is possible for this to be a random event as they do occur, which is why ECC memory is used in suns and other big servers. In this case it was repaired with no harm done. If it happens again you have a bad memory location that should be traced and replaced. Otherwise you could leave things be.

Matthew

The Universe: God's novelty screensaver?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top