Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Urgent - DBCC Checkdb reports errors

Status
Not open for further replies.

mlawrenson

IS-IT--Management
Nov 17, 2003
33
GB
Hello, We have a SQL Server 2000 SP3 running on Windows 2000 ADvanced Server, failover cluster.

We have been getting consistency errors reported by dbcc checkdb. Normally we get them once or twice a year, but in the last few days we have been getting them regularly.

We run a dbcc checkdb as a scheduled job daily. On Friday we got the errors on one databas, I repaired the database and the next day we got the errors again. And it is happening on more than db!! These errors are different on different occasions but are serious enough for the users to be unable to use the systems!

We have no hardware problems reported in event viewer, we have nothing unusual in the sql logs. We do occasionally get a Stack Dump in the log that is triggered by a SQL Server. Triggered by a detection of fatal unexpected error. Not sure if this related or not.

We have thought about upgrading to SP4, but we have various other SQL Servers, that don't have this problem so I don't think that will solve it.

Any help would be greatly appreciated.
 
Do a scan of your hard drives. Bad sectors don't always show up in the Event Log and neither do bad disk controllers. It might also be a good idea, just on principle, to have your hardware people check out the server case for CPU temp issues and dust problems.

Bad RAM could also account for some of your problem, and again not be listed in the Event Log.

Check your UPS (uninteruptable Power Supply). If it is having issues, it could be causing corruption in the database during power surges and the like. Or, if your server is plugged directly into the wall (Bad Thing), it could be your electrical system has been having fits and surges which cause such issues during the I/O process.

Don't rely on the Event Log to tell you about all hardware problems because it is mainly there to tell you about network issues and software issues, not hardware problems. Just because something isn't listed there doesn't mean you can check it off your list of "That's not what is wrong here". Good troubleshooting always starts with the basics, regardless of what the server may say.



Catadmin - MCDBA, MCSA
"No, no. Yes. No, I tried that. Yes, both ways. No, I don't know. No again. Are there any more questions?"
-- Xena, "Been There, Done That"
 
Check the SAN for issues with the storage controllers and media.

Does this happen no matter which node is in control of the database?

What does it say in the mini-dumps is the cause of the dumps?

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
thanks for your replies.

The stack dump starts with the following:

This file is generated by Microsoft SQL Server 8.00.760
upon detection of fatal unexpected error.

Then goes onto say:

* BEGIN STACK DUMP:
* 03/27/06 07:03:45 spid 85
*
* Exception Address = 008F519E (CheckTables::TextProcessText + 00000113 Line
0+00000000)
* Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION
* Access Violation occurred reading address 04851004
* Input Buffer 58 bytes -
* dbcc checkdb WITH NO_INFOMSGS
*
*
* MODULE BASE END SIZE
* sqlservr 00400000 00B2CFFF 0072d000
* ntdll 77F80000 77FF9FFF 0007a000
* KERNEL32 77E80000 77F30FFF 000b1000
* ADVAPI32 77DB0000 77E0AFFF 0005b000
* RPCRT4 77D30000 77D9CFFF 0006d000

and it goes on and on with similar info.

I found we also got the following error in the sql logs:

Could not find the index entry for RID '36100101304953533030303335393937' in index page (1:204967), index ID 4, database 'ARSystem'..

Error: 644, Severity: 21, State: 5

MS support web site looks like we should upgrade to SP4, so think that is what we will do for now. But I'm not convinced that will solve the problems. My view is that this is a hardware problem.

Just to make matters worse. I can't test the other node, the cluster hasn't been installed properly!!
 
I wouldn't upgrade the server until the issue was resolved. A Service pack upgrade isn't going to fix that problem.

Further down the file will be the info that is telling you the command that caused the failure.

If you post the file somewhere (don't just paste it into the forum) and put the link in the site I can take a look at it.

The line in the sql logs should tell you the table as well as the database. Look in the sysindexes table, find index 4 and get the name of it. Then drop the index, and recreate it. That should do to trick.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
thanks mrdenny, where shall i put the file?
 
If you have a personal web site, or some sort of company site that it open to the public that you can post it to that would work.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
Thought I'd update you on this. I ran the SQLIOStress Utility over last weekend and it reported various stale read errors. Check it out with our hardware vendor and they have released a firmware upgrade to their disk controller which fixes the problem of SQL Server stale reads, and other sql related caching errors. FYI:We are using the HP MSA 1000 controllers.

Hope this fix is useful for others.

Cheers for your replies
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top