Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Wanet Telecoms Ltd on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

HS20 Blades = BSE Voltage Under

Status
Not open for further replies.

CoreyWilson

IS-IT--Management
Feb 3, 2004
185
CA
Hello gang,

I have two HS20's in one chassis that are currently sending alert messages stating a monitored voltage is running outside tolerance.

When viewing the chassis through web interface two blades are listed as errored with BSE Voltage Under error messages. What does this mean? And what do I do to correct it? I could not find any documentation to thoroughly explain why this may be happening. This is a full chassis with only these two blades having any problems. These servers co-located so I cannot physically see the boxes.

Also, I received an IBM director alert about another server also in the same chassis stating: Management Processor Assistant agent failed to load. on system "servername". What does this mean and how do I fix it?

I am new to IBM's blade hardware.

Thanks!
 
A1: Most likely what you are seeing is the result of a BIOS issue that was corrected in the current version. If you are not running v1.06 (Build BSE119AUS), start there. If you are, then provide the machine type and model of the "failing" blades.

A2: The Management Processor Assistant tool from IBM Director is designed to manage Advanced Systems Management processors on some servers(like the BladeCenter Management Module/RSA/RSA II). Blades are not one of them. The Blades have an Integrated Systems Management Processor (ISMP) installed, which has no device drivers. MPA should not be installed on systems without the aforementionied ASM processors. If you saw this failure on a system with the ASM processors referenced, it most likely means the device drivers were not installed before the IBM Director code was installed. If this is the case, uninstall Director, reboot, install the correct device drivers, then reinstall Director.

BTW, everyone is relatively new to Blades :eek:)
 
Hi,

Thanks for the quick reply catorze.

Could you explain what this issue in the previous bios was or what causes/caused it to randomly display these errors then? I just looked and it appears I lied. The two blades with this problem is actually an HS40, not HS20. It looks like we may have some 40 AND 20 in that enclosure?! Anyways checking the chassis webpage and then firmware VPD shows the BIOS version for the server in question as being 3.2 and the diagnostics as being 1.00. However on IBM's website I do NOT see a version 3.2 as a bios update, as I was going to compare that bios to older ones and see what fixes they had incoporated over the history of the bios revisions. Is there another way to access the blade server directly and see the exact revision, like there is with HP SIM and its agents for instance? I know and the IP and HOSTNAME but not 100% sure how to gain remote access to that actual blade itself aside from using the chassis interface.

If you could explain the bios issue and or how to connect to the blade itself that would be great.

Also as for the director alert Management Processor Assistant agent failed to load message, how can I confirm if the correct drivers are installed for this service? And why would I receive this alert if it wasnt?

Full alert message as follows:

Management Processor Assistant agent failed to load. on system "Servername_changed"

Event Text Management Processor Assistant agent failed to load.
Date 9/26/2004
Severity Critical
Event Type Director.Director Agent.MPA
System Name SERVERNAME
Sender Name SERVERNAME

Please note I have taken over management of these systems, I never put them in place so I am not sure how they were initially configured.

Thanks again for the assistance.
 
I re-read your original post and realized you are viewing from within the Management Module (MM) web access. A couple of things in consideration that these are HS40s, not HS20s:
1. If the MM is not at the current version of firmware, get there first. The messages may be bogus because support for the HS40 was not included in the early releases of the MM firmware. This may also explain the incorrect BIOS version indicated.

2. The HS20 does not have a device driver for the ASM hardware. Consequently, MPA should not be selected for installation with IBM Diretor Agent for these Blades. If it was (sounds like it), I recommend uninstalling, then reinstalling without it. You will continue to get these messages until you do, or choose to ignore the message.

The HS20 has a different ASM chipset (IPMI versus ISMP), and there are device drivers for the IPMI controller. However, support for IPMI was not included in IBM Director until v4.20. If the drivers were not installed during IBM Director Agent's installation, Director must be reinstalled. If you are not running v4.20, then you must upgrade to this version to get support for the IPMI controller on the HS40. If you head in that direction, I recommend updating the BIOS and ASM firmware, then installing the device driver before the upgrade/reinstall.
 
Hi catorze,

I apologize for beating a dead horse so to speak, but I appreciate your knowledge on this subject while Im trying to learn the ins and outs of these systems and there related hardware functionality. You mention that it is possible that these messages could be bogus give the lack of proper if any real support for the HS40's with the level of management modual firmware version we likely have installed. I guess my question is then, is there a reason we have not received these particular alerts before or if have, its been so long ago that I cannot remember seeing them.

Lets just say our software and firmware versions were all in tune. What would cause this alert to be sent, what does it usually mean, what causes it usually and how can it be resolved since I was aksed yesterday afeternoon how to recitify the problem and clear the error (explaination triangle) from both these servers in the web management console of the chassis. Unfortunatly I can not find much on IBM's site about this problem, or what to do to solve it, especially considering it just popped up Sunday evening - and as mentioned other blades in the same chassis are not reporting any problems, both 40 AND hs20's.

Thanks again for your assistance on this matter.
 
Hard to say without knowing the ins-and-outs of what you've done to explain why the events started. Could be timing-related, could be an OS update, could be anything. Frankly, why it started wasn't my focus until we rule out everything known that could be a contributor to the issue. IBM's hardware, especially ASM hardware, is being developed and released so quickly that getting to the lastest BIOS, firmware, and device driver should be a part of routine maintenance. As in most development shops, the code tree they work from is the latest release, so any issues uncovered from an old release may have already been corrected, but undocumented. The net is that before you beat your head over "bad" hardware, realize that the hardware doesn't do anything without the firmware, and software. If the instructions are bad, the output will also be bad. Since the hardware is new, it's possible that you are the first client in the world that discovered this issue, based on some unique circumstance of your environment. I DO recommend that you contact IBM Support for assistance, but you should know that if you are not already at the current version of BIOS, firmware, and driver, they will very likely recommend you start there. If they don't, I would be reluctant to let them replace anything before they could provide solid evidence of a hardware failure that will be resolved by a replacement part. The sytem error log from the management module is the best starting place to look for hardware faults. If more than one fault is indicated, you should work to resolve the first fault, and ignore any additional faults until the first fault is resolved. The first fault may generate additional faults, depending on what failed first. For example, if an I2C cable fault occurs, all environmental sensors will return an unknown status. It's not likely that all fans, voltages, and temperatures failed at the same time (unless the system is struck by lightning!), and you should only deal with the initial failure.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top