Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

BSODs

Status
Not open for further replies.

Frank4d

Technical User
Nov 5, 2004
724
US
I have seven identical computers at work, both hardware and software. Dual Xeon @ 2GHx server boards with 2 GB RAM and with 10 PCI-X/PCI cards in a passive backplane. Running XP Pro + SP1a.

One of the machines has crashed eight times in the last month with BSOD errors 0x000000D1, 0x0000000A, 0x1000008E, and 0x00000050 stop errors. The memory dump files point to possible problems with hal.dll and ntoskrnl.exe but offer no other information. There is nothing in the Event log files.

I have re-cloned the hard drive twice. Memtest86 can run for hours with no errors. Prime95 can run for hours with no errors. PSU is good according to my DMM. CPU temp is 49C and HDD temp is 45C. SMART reports no HD problems. All of the PCI-X and PCI cards are working (except when the computer crashes).

As I said, seven computers with identical hardware and identical hard drive images... and only one of them crashes. Any ideas?

 
0x000000D1: DRIVER_IRQL_NOT_LESS_OR_EQUAL
The system attempted to access pageable memory using a kernel process IRQL that was too high. The most typical cause is a bad device driver (one that uses improper addresses). It can also be caused by caused by faulty or mismatched RAM, or a damaged pagefile.

0x0000000A: IRQL_NOT_LESS_OR_EQUAL
Technically, this error condition means that a kernel-mode process or driver tried to access a memory location to which it did not have permission, or at a kernel Interrupt ReQuest Level (IRQL) that was too high. (A kernel-mode process can access only other processes that have an IRQL lower than, or equal to, its own.)

0x0000008E: KERNEL_MODE_EXCEPTION_NOT_HANDLED
A kernel mode program generated an exception which the error handler didn’t catch. These are nearly always hardware compatibility issues (which sometimes means a driver issue or a need for a BIOS upgrade).

0x00000050: PAGE_FAULT_IN_NONPAGED_AREA
Requested data was not in memory. An invalid system memory address was referenced. Defective memory (including main memory, L2 RAM cache, video RAM) or incompatible software (including remote control and antivirus software) might cause this Stop message, as may other hardware problems (e.g., incorrect SCSI termination or a flawed PCI card).


The way it looks, it could be a bad PCI card (or another piece of hardware, including the north and southbridges), or one that may not be totally compatible with the drivers that are on the CLONED IMAGE...

Alternatively, it could aswell be a bad mobo (caps) as was suggested by someone...

Ben

"If it works don't fix it! If it doesn't use a sledgehammer..."
 
I doubt a problem with the Ghost image, or with a virus or malware. These are on a heavily firewalled corporate network with P2P, instant messaging, email, and Internet access disabled. I have six other exactly identical computer that don't crash.

There is one potential problem that I am aware of so far. Our test equipment computers are supposed to be blocked from all pushes, but due to a DNS problem this one has been getting pushed Altiris and Microsoft updates (the others do not). One hour after I last restored a Ghost image, IT started pushing files to it.

My guess is bad CPU, I/O controller, hard drive, bad PCI card. or some software pushed by IT. It should be fun to troubleshoot since these are mission critical systems used full-time Monday thru Friday and I only have one set of parts available for swapping out (only part-time of course) until I find the problem.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top