Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Optimization help needed PLEASE

Status
Not open for further replies.

MsChelle

Programmer
Jul 11, 2002
54
US
First, let me say, I am not a DBA, we don't have a DBA, and we are learning this as we go, out of necessity.

We have 5 (identical in structure) sql databases that store large amounts of data. We are experiencing massive performance issues lately and I was hoping someone could help. These are for cell site statistics. Each cell site has data for itself, cell sites it talks to (neighbors), and for up to each of its 20 carriers (Frequencies). So, we have a heirarchal nightmare.

Database structure:
Table 1 - Date, time, siteID, approx. 30 columns of numerical data.
Indexed on date, time, siteid. There are no unique identifiers. Approximately 4000 individual siteIDs, 48 halfhours per day.

Table 2 - Date, time, siteID, carrier, approx. 10 columns of numericla data
Indexed on date, time, siteID, carrier

Table 3 - Date, time, siteID, neighborID, approx. 20 columns of numerical data
Indexed on date, time, siteID, neighborID

Table 4 - SiteID, SiteName, SiteNumber, other identifying information
Indexed on SiteID, SiteName, SiteNumber

Table 5 - 24 hour rollups of table 1, grouped by date and siteID

Table 6 - 24 hour rollups of table 2, grouped by date, siteID, and carrierID

Table 7 - 24 hour rollups of table 3, grouped by date, siteID, neighborID

What needs to be done regularly:
-Insert text files for halfhourly updates to Tables 1, 2, 3
-Daily removal of halfhour data from Tables 1, 2, 3 for data older than 10 days (approx 200,000 rows on table 1, 3.8 million rows on tables 2 and 3)
-24 hour rollups of tables 1, 2, 3 to create tables 5, 6, 7

In the meantime, users are *trying* to access the data via ColdFusion queries by accessing web pages. Some of these pages query multiple siteIDs, some query single siteIDs. Most require joins of tables 1, 2, 3 and 4, or some combination of them.

All of these things need to happen on each of the 5 databases, which are all stored on the same server because it's all we have.

The problem -
Queries are stalling, causing coldfusion to time out the pages (Eventually). Scheduled jobs are stalling and queuing up, resulting in missing halfhour data. Scheduled jobs for deleting the old data (run in batches of 500-1000) are failing or taking several hours to execute, then causing the halfhour updates to wait and miss data.

I have a lot of angry people asking me daily when the pages will be back and working properly again, and I don't know what to tell them anymore.

Can anyone help?
 
AFAIK dirty reads will read uncommitted data inserted by another process/transaction. In above case, query won't wait for anything - but may return half-inserted data.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
yes and I don't have such processes running on my system (lucky me) that's why I said is it acceptable
if you have a process that updates the amount then calls another proc that adds 10 cents to that amount, if you use noclock you could get the data that is never true (you will get the value of the table before the 10 cents is added but after the first update and this is totaly unacceptable of course)

Denis The SQL Menace
SQL blog:
Personal Blog:
 
I guess nolock would not be enough... typical story with heavy stats reports. That's why I rambled about 5-minute timeframe - snapshot isolation for lazy DBAs [smile].

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
I think that with the way we use this data, the worst that could happen would be users would see data for 1:00, when 1:30 is updating.
 
OK, is some delay (likely less than 30 minutes) acceptable?

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
If we have to. Like sqldenis said, better than timing out. When we get stable, I can remove them (unless no one notices, then yippee!)
 
NOLOCK hint can help a lot. The only problem is about half-inserted data. Say, query fetches most recent data from Table1 and Table2. During execution, bulk insert added rows into Table1 but Table2 is not finished yet. Query with NOLOCK returns wrong results. Longer duration of bulk inserts - greater chance for that to happen.

But if there is relatively easy way to ignore records inserted in last X minutes (any DateTimeInserted column or BulkInsertNo or something?) you can avoid such scenarions - basically by causing "artificial" delay longer than bulk insert is supposed to last. I'm not sure this is always practical because lots of queries must be changed (WHERE clause) but general idea looks OK IMHO.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
Data is listed out by date, time, and cellid (for the most common report, which is halfhour data for a particular cell). It's an inner join query, matching date, time, and cellid, so I don't *think* having one table updated would be a problem. The whole time period should be ignored, right?

for example
select table1.date_val, table1.time_val, table1.cellid, table1.stat1, table1.stat2,[...table1.stat30,] sum(table2.stat3) from table1 inner join table2 on (table1.cellid = table2.cellid and table1.date_val = table2.date_val and table1.time_val = table2.time_val)

(can't post actual code here for legal reasons)

 
In that clean case - yes, no inserted data = inner join won't display rows.

------
Theory: everybody knows everything, nothing works
Practice: everything works, nobody knows why

[banghead]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top