Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Optimization help needed PLEASE

Status
Not open for further replies.

MsChelle

Programmer
Jul 11, 2002
54
US
First, let me say, I am not a DBA, we don't have a DBA, and we are learning this as we go, out of necessity.

We have 5 (identical in structure) sql databases that store large amounts of data. We are experiencing massive performance issues lately and I was hoping someone could help. These are for cell site statistics. Each cell site has data for itself, cell sites it talks to (neighbors), and for up to each of its 20 carriers (Frequencies). So, we have a heirarchal nightmare.

Database structure:
Table 1 - Date, time, siteID, approx. 30 columns of numerical data.
Indexed on date, time, siteid. There are no unique identifiers. Approximately 4000 individual siteIDs, 48 halfhours per day.

Table 2 - Date, time, siteID, carrier, approx. 10 columns of numericla data
Indexed on date, time, siteID, carrier

Table 3 - Date, time, siteID, neighborID, approx. 20 columns of numerical data
Indexed on date, time, siteID, neighborID

Table 4 - SiteID, SiteName, SiteNumber, other identifying information
Indexed on SiteID, SiteName, SiteNumber

Table 5 - 24 hour rollups of table 1, grouped by date and siteID

Table 6 - 24 hour rollups of table 2, grouped by date, siteID, and carrierID

Table 7 - 24 hour rollups of table 3, grouped by date, siteID, neighborID

What needs to be done regularly:
-Insert text files for halfhourly updates to Tables 1, 2, 3
-Daily removal of halfhour data from Tables 1, 2, 3 for data older than 10 days (approx 200,000 rows on table 1, 3.8 million rows on tables 2 and 3)
-24 hour rollups of tables 1, 2, 3 to create tables 5, 6, 7

In the meantime, users are *trying* to access the data via ColdFusion queries by accessing web pages. Some of these pages query multiple siteIDs, some query single siteIDs. Most require joins of tables 1, 2, 3 and 4, or some combination of them.

All of these things need to happen on each of the 5 databases, which are all stored on the same server because it's all we have.

The problem -
Queries are stalling, causing coldfusion to time out the pages (Eventually). Scheduled jobs are stalling and queuing up, resulting in missing halfhour data. Scheduled jobs for deleting the old data (run in batches of 500-1000) are failing or taking several hours to execute, then causing the halfhour updates to wait and miss data.

I have a lot of angry people asking me daily when the pages will be back and working properly again, and I don't know what to tell them anymore.

Can anyone help?
 
First of all, breathe and relax.

Performance tuning is a huge issue and many many things could be wrong.

First, are you using cursors to do any of those jobs of deleting data? If so they are the very first thing you need to get rid of. Never ever use a cursor to perform mass inserts, updates or deletes.

Second, have you updated statistics recently or re-indexed the database? With that amount of data change, this could be causing the problem.

Third, you need to go buy yourself a big book on performance tuning as there are many many things that can be slowing a system down. This FAQ has some book suggestions:
Useful Reference Books for SQL Server Professionals faq183-3324





Questions about posting. See faq183-874
Click here to help with Hurricane Relief
 
If this is a real mission critical app, I would rent a DBA, this is a mountain to climb. Simple start is what is your sql server like, quad, how much memory, what kind of raid, etc. What is the access problem, the rollups? Is there any maintenance windows?
 
Here is what would I do (IMHO):

a) make half-hour imports as efficient as possible.
b) narrow timeframe for keeping live data from 10 days to 7, 5 or less (if possible)
c) analyze CF queries, optimize the worst ones, probably redesign database so that common queries run fast. This is possible and likely acceptable because system is not real-time (mass updates happen every 30 minutes, right?).
d) put everything on a fast RAID

Amount of data you mentioned is not much big... there are some 95% chances system suffers from serious hotspots and locking contention. Multiply that by 5 (databases)... [snail]

a) and especially c) make very large topics... more info please.


------
"There's a man... He's bald and wears a short-sleeved shirt, and somehow he's very important to me. I think his name is Homer."
(Jack O'Neill, Stargate)
[banghead]
 
Also start looking at the sql you use in query analyzer and check the optimization plans, make sure you are avoiding table scans and using your indexes. Look at sql-performance.com for explanations in depth for this
 
Thank you all for the responses. <<Breathing in...breathing out...>>

To answer SQLSister - no, no cursors. I found so much about them on this site and other places that screamed BAD BAD BAD I didn't go that route.
Also, I don't know how to "update statistics", but I added a reindex to the daily jobs. Those are part of what hang up the rest of the job. I'm wondering if it's because there is no unique key? I can't think of a way there could be a useful one, though.

To timscronin - I took your advice and announced on the team call today that we need to get a contractor DBA ASAP. The rest of the team agreed and we are all wondering why we didn't think of this sooner. LOL So tomorrow we will be writing up a proposal to sr. management to get this approved. Given the response from all departments when this is down, I don't forsee any issues getting approval. We want someone to tell us what hardware we would need ideally, whatever additional software we might need, and any configuration changes we should make...as well as fixing us for now and setting us up so we can maintain it in the future and avoid this kind of thing. (Does anyone know any DBA contractors in Columbus, OH area?)


To tomvrunt - a) halfhour imports pull 3-5 large text files together for each of tables 1, 2, and 3. I currently use bulk import. (largest database does 5 text files, bulk imported every halfhour, into each table). Is there a better way?
b) I've considered this, since pulling 10 days times out anyway. Maybe I'll change that tomorrow.
c) I've played with the queries for years, always making them better, but boy are they ugly. Joining multiple tables, either within the query or using a dbtype=query to join via coldfusion (which is slower but works better with the heirarchal problems we have). I will continue to optimize these queries, as well as the scheduled jobs. All of the queries involve numbers, which are used in sometimes rather long calculations. Is it better to calculate in the query or afterwards? The problem I've encountered with calculating later is I don't know how to allow for web page re-sorts using formula results.
d) We are currently on RAID, but I fear it's RAID 5 (which I've read in numerous places is not the best choice for sql server) but don't know because I am not the one that set up the server and the guy who did doesn't know (ugh) Another good reason for someone to come in and tell us what to do.

The only optimization plan we have in place is backing up the transaction logs (which we learned about the hard way! Like I said..we really don't know what we're doing here). We have a tape backup once a week of all the databases, also. How do I check for table scans? I've read about an index wizard, but don't know where to find it.

Thanks again!


 
> I currently use bulk import. (largest database does 5 text files, bulk imported every halfhour, into each table). Is there a better way?

You mentioned (for example) Table 1 is indexed on date, time, siteid. This means all columns together (composite key), right? Is that index clustered or nonclustered?


> The problem I've encountered with calculating later is I don't know how to allow for web page re-sorts using formula results.

I see the problem... btw with web apps it is of great importance to reduce result set as much as possible. And sorts (by calculated column or whatever) usually collide with that concept - wanna show top 20 values from 10,000 available you must sort all 10,000 to get top 20. Is that true in this case?


> How do I check for table scans?

Manually by checking exec plan(s), or with profiler by monitoring Scans event class. But again, looks like system is full of hotspots... maybe it's better to copy production base somewhere else (dev server?) and do query tuning there.


> I've read about an index wizard, but don't know where to find it.

From EM: Tools -> Wizards -> Management
Or by typing "wiztrace" in Run box.

Personally I don't rely on ITW much - too smart, too far from perfect.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
>This means all columns together (composite key), right? Is that index clustered or nonclustered?

Tables indexes are not clustered - should they be? Would that help? Right now it's basically
CREATE INDEX ecellindx on ecell (date_val, time_val, cellid)
Would clustered help?

>And sorts (by calculated column or whatever) usually collide with that concept - wanna show top 20 values from 10,000 available you must sort all 10,000 to get top 20. Is that true in this case?

Unfortunately, yes. The query narrows down to eliminate anything that doesn't meet any of the criteria for the sort columns, but it's still a huge sort.


Also just found out one of our RAID drives is bad and needs to be replaced. That, I'm sure, is causing some of these problems, but I think we have just too many. ARRGGH!
 
> Tables indexes are not clustered - should they be? Would that help? Right now it's basically
> CREATE INDEX ecellindx on ecell (date_val, time_val, cellid)
> Would clustered help?

Cluster dictates physical sort of data in table. This speeds up range-retrieval queries (WHERE indexed value BETWEEN X AND Y) because rows query has to analyze are next to each other, optimizer can do read-aheads and disks fetch data faster. Date/time values are relatively distinct and already come sorted (every half hours) so INSERT costs won't be high... I'd say this is worth trying... on dev server first of course [smile].

Or even combination of two separated indexes - clustered for date/time, nonclustered for cellid.

> Unfortunately, yes. The query narrows down to eliminate anything that doesn't meet any of the criteria for the sort columns, but it's still a huge sort.

Is it possible to pre-aggregate common stats into separated tables? That would make 1/2 hour imports a (little) bit slower but reports will run faster - and blockings on main tables will be drastically reduced.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
When you reindex the table the data will not be available during that time, do this on weekends at 1AM if possible
Better is defragmenting the tables (if needed) during the week
Lookup sp_updatestats and UPDATE STATISTICS in BOL on how to update statistics
Do you have transaction where for example proc 1 does
update table A
update table B
and proc 2 does
update table B
update table A
??
try writing all your proc so that the tables referenced are in the same order (will prevent deadlocking)
If you can live with dirty data, phantom reads and repeatable reads try using with noclock (I don't recommend this but we do use this sometimes for data tht is a plain vanilla select and not used as an update (in the worst case clients will see data that is a minute old))

Denis The SQL Menace
SQL blog:
Personal Blog:
 
In the meantime, users are *trying* to access the data via ColdFusion queries by accessing web pages. Some of these pages query multiple siteIDs, some query single siteIDs. Most require joins of tables 1, 2, 3 and 4, or some combination of them.

How critical is it for this data to be 100% up to date?
maybe you shoud use with (nolock) for these queries

select * from table1 a with (nolock) join table2 b with (nolock) on a.id =b.id...etc etc etc

Denis The SQL Menace
SQL blog:
Personal Blog:
 
> How critical is it for this data to be 100% up to date?

Or maybe doing nolock and ignoring data inserted in last 5 or so minutes. Assuming bulk insert won't last longer, dirty reads will never happen. The only drawback is 5-minute "delay"... there are many options.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
Retired :)

I have some of his other quotes ready though...

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 

I will try clustering the indexes (indices??) on one of the db's and see how that works. We don't have a dev server :):ducking:: don't kill me! We are working with what we can get our hands on)

>>Is it possible to pre-aggregate common stats into separated tables?
I'm afraid it would take the full halfhour to reconcile all that, then it would have to start over right away. I've pre--compiled one of the most critical reports into a separate table (boy is it fast when it works!!) but it's also hanging up and not getting updated enough, and possibly causing something else to hang?? Table 1 to tables 2 and 3 are one-to-many relationships, so the joins are real ugly.

>>try writing all your proc so that the tables referenced are in the same order
If you mean the scheduled jobs, then they are already in the same order. Each table (1,2,3) only gets updated once every halfhour. Tables 5,6,7 get updated once a day, and used to run that way just fine. I think deleting old data from 1,2,and 3 is hanging up the updates of those tables, though. I'll check the order on those, but since deleting takes so much longer, not sure there's a way around it.
If you mean the queries themselves, they are run from the coldfusion pages and I have no control over what order someone decides to look at a report.

>>How critical is it for this data to be 100% up to date?
Very. It's also critical that we have 24 hours worth of data (thus no maintenance window). We use this data for performance monitoring of the cell sites, and even the halfhour delay we have to have already is a little irritating (tho unavoidable). The goal is to have as near real-time stats as possible. That said, I have tried the WITH(nolock) on a couple of scripts, with the thought that instead of data at 5 after every halfhor, they get updated to the user at 10 after, but I don't see a noticable difference. Maybe should do them all and see.

You guys are awesome, thanks again!


 
what I mean by using noclock is that for example let's say
you are updating the data at this moment 12:00 01 if you query with noclock you will get the values from just before that if you query regularly you will time out becuase the job locked the table (or vice versa)

so basically if I query at 12:01 and the jobs is running is it acceptable to see values for 11:59 or whatever the time is just before the job began, i guess it's better than a timeout isn't it?

Denis The SQL Menace
SQL blog:
Personal Blog:
 
noclock? [bigears]

And are you talking about snapshot isolation or vanilla dirty reads?

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
nolock or noclock? I'm using nolock. Not familiar with noclock.

 
let me clarify more, the data that i have doesn't depend on other data it can not be 'dirty' per se it's either the older value or the new value that will be displayed
if it's rolled back the data that ws show is still correct it it was a dirty read
we should also call it uncommited data not dirty

Denis The SQL Menace
SQL blog:
Personal Blog:
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top