Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Your Observations On Decreasing Query Execution Time 1

Status
Not open for further replies.

4946

Technical User
Joined
Apr 25, 2004
Messages
80
Location
MX
I am working with this code in SQL Query Analyzer.

This query runs against a database table with over 1.8 million records. The time frame of the query, however covers only a short period and should look at only 2000 to 5000 records (at least that is my intent). The "enddate" field is a 10-digit integer (epoch time in seconds since 1970) and is indexed (non-clustered).

This query is used three times as subquery code in a larger query that requires about 2min 45sec to run unless I use the code listed below that "runs instantly". In that case even the complex query executes within 1 second.

Here are the attempts I have made so far and the resulting execution times of the single select query. The attempts to give a starting value to "enddate" with an expression do not seem to work, only when a hard-coded value is entered does it execute quickly. I have also tried
Code:
Cast(datediff(ss, '01/01/1970', getutcdate())-3800, int)
for @StartPoint.
Your observations and expertise will certainly be appreciated.

WinN

Code:
************* This query takes 36 seconds to execute ********/

SELECT	InteractionTrack.workgroupid, Sum(InteractionTrack.answer) AS HandledCalls, 
	sum(queueduration)/Sum(InteractionTrack.answer) as AvgQueTime
FROM         dbo.interactiontrack
WHERE 
enddate BETWEEN 
	(CASE 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) - 1800 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) END )
	AND
	CASE 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600)
	 WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) + 1800 END 
and workgroupid >0
group by workgroupid


/****************************** This Query takes 30 seconds to execute *******************/

SELECT	InteractionTrack.workgroupid, Sum(InteractionTrack.answer) AS HandledCalls, 
	sum(queueduration)/Sum(InteractionTrack.answer) as AvgQueTime
FROM         dbo.interactiontrack
WHERE 
enddate >  datediff(ss, '01/01/1970', getutcdate())-3800
and	
enddate	
	BETWEEN 
	(CASE 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) - 1800 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) END )
	AND
	CASE 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600)
	 WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) + 1800 END 
and workgroupid >0
group by workgroupid

/**********************************  This query takes 32 seconds to execute ***************************/
Declare @StartPoint int
set @StartPoint = datediff(ss, '01/01/1970', getutcdate())-3800

SELECT	InteractionTrack.workgroupid, Sum(InteractionTrack.answer) AS HandledCalls, 
	sum(queueduration)/Sum(InteractionTrack.answer) as AvgQueTime
FROM         dbo.interactiontrack
WHERE 
enddate >  @StartPoint
and	
enddate	
	BETWEEN 
	(CASE 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) - 1800 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) END )
	AND
	CASE 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600)
	 WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) + 1800 END 
and workgroupid >0
group by workgroupid


/************************************ This query executes instantly **************************/
Declare @StartPoint int
set @StartPoint = datediff(ss, '01/01/1970', getutcdate())-3800

SELECT	InteractionTrack.workgroupid, Sum(InteractionTrack.answer) AS HandledCalls, 
	sum(queueduration)/Sum(InteractionTrack.answer) as AvgQueTime
FROM         dbo.interactiontrack
WHERE 
enddate >  1122392945
and	
enddate	
	BETWEEN 
	(CASE 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) - 1800 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) END )
	AND
	CASE 
	WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600)
	 WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
		datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) + 1800 END 
and workgroupid >0
group by workgroupid
 
My guess it that becuase the last one is deterministic and seems to have a value in where date > that sql is using the index you have on the enddate field..

Strange though.

It seems that sql is saying that because this is comming from a variable it is going to return a wider range of values (15% or so) and that it is therfore better to do a table scan.. (but YUK.. it is fundamentally ignoring the "and" and doing everything like an "or" or "between" type of query.)

also.. have you looked at the query plan and I am guessing that your cluseted index isnt enddate..

YOu havent got you table clusered on the primary key have you?

I find that datefields (especially if that is how you query the data often) is/are one of the better clusetering keys)

HTH

Rob
 
Rob,

Thanks for the quick reply. No, the clustered index is on two other columns I don't use in the query AND yes, the primary key is also clustered. Unfortunately this table is only one of many in a moderately complex database that I only read, not modify. (DBA is a a lot larger than I am!!)

The date field ("enddate") is not truly stored as a date, but as an integer field reflecting the number of seconds elapsed since 01/01/1970.

Thanks,

WinN
 
Sorry web went away for a while [smile]

WinN..

Just a couple of thoughts ..

It sounds like your table has a multi column clustering key.

in SQL2k I personally believe this to be a very dangerous thing. It PUFFS up every other index by adding all of the cols it has in it to the new non clustered ones.

What I would sugest if you want (and need very fast access to the data you need) is to look into building a good covering index..

(remove all Clustered index keys from the folloing create statement.)
Code:
Create NonClustered index Nc_Inter_EndDat_WgId_Ans_Qued
on interactiontrack(enddate,workgroupid,answer,queueduration)

I think you will find the query runs in Milliseconds :)
 
NoCoolHandle,

Thank you for the reply. Here is the index information from SQL Query Analyzer on the existing index structure:

Code:
Index                   Clustered    Columns
interactiontrack_pk     Yes          companyid,interactionid
ix_calltype             No           calltype
ix_enddate              No           enddate
ix_projectid            No           subprojectid
ix_userid               No           userid
ix_workgroupid          No           workgroupid

Do you feel that the problem lies with the two clustered primary keys? Unfortunately the structure of the database cannot be changed.

WinN
 
Win,

2 things

1 for the query's and 1 regarding your indexs..

FIrst the query (add the red bit and try running it)

Try
Code:
SELECT    InteractionTrack.workgroupid, Sum(InteractionTrack.answer) AS HandledCalls, 
    sum(queueduration)/Sum(InteractionTrack.answer) as AvgQueTime
FROM         dbo.interactiontrack [red](index=ix_enddate)[/red]
WHERE 
enddate >  datediff(ss, '01/01/1970', getutcdate())-3800
and    
enddate    
    BETWEEN 
    (CASE 
    WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
        datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) - 1800 
    WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
        datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) END )
    AND
    CASE 
    WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 < 1801 THEN 
        datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600)
     WHEN datediff(ss, '01/01/1970', getutcdate()) % 3600 > 1800 THEN 
        datediff(ss, '01/01/1970', getutcdate()) - (datediff(ss, '01/01/1970', getutcdate()) % 3600) + 1800 END 
and workgroupid >0
group by workgroupid


1 word of caution.. Forcing index use isn't always a good option.
For instance
Code:
select * from 
northwind.dbo.[Order Details]
where orderid between  1 and 100000
Incurs 10 units of i/o

but
Code:
select * from 
northwind.dbo.[Order Details]
(index=orderid)
where orderid between  1 and 100000
Incurs 4315 Units


The Clustered index on the primary probably isn't doing you any good at all! THis is one of SQL Servers most stupid defaults!!!

More likley than not, when you query a primary key value here, you want an exact match. This means that a NonClustered Key would work just as well. (2 more reads on average over each record - which in the case of a singleton query is often no more than 3 or 5 units of i/o

By sorting the table on a common Range Query(s)- groupby fields or order by fields you can see amazing performance gains as SQL can then do "partial table scans - or partial index scans" and get all the data it needs faster.

The actual fields in your clusted index don't look too bad, as it will only be adding (i am guessing they are both int fields) 8 bytes to each nonclustered indexes.

Having clusterd indexes on a 900 char filed would be MUCH MUCH WORSE. So that isn't really a problme. But I am guessing it doesn't help very many querys.

I tend to get very specific on what field gets to be my Clustering filed and almost NEVER use the primary key. Foreign keys are often much better choices.

But 1 thing is for sure you really need to pick your battles wisely. Pick the wrong one and you can kill more querys than you help.

I have seen a query that only returns 10% of the data incur a table scan every time the data was queryed and if an index was forced the query came back at 3 to 5 X the expense of the table scan.
 
I notice this "datediff(ss, '01/01/1970', getutcdate()) " appearing frequently within your query. You may find increased performance by declaring a variable, setting it to this value, and then using the variable in the query.

Code:
Declare @SecondsSince1970 Integer
Set @SecondsSince1970 = datediff(ss, '01/01/1970', getutcdate()) 

SELECT    InteractionTrack.workgroupid, Sum(InteractionTrack.answer) AS HandledCalls, 
    sum(queueduration)/Sum(InteractionTrack.answer) as AvgQueTime
FROM         dbo.interactiontrack (index=ix_enddate)
WHERE 
enddate >  @SecondsSince1970-3800
and    
enddate    
    BETWEEN 
    (CASE 
    WHEN @SecondsSince1970 % 3600 < 1801 THEN 
        @SecondsSince1970 - (@SecondsSince1970 % 3600) - 1800 
    WHEN @SecondsSince1970 % 3600 > 1800 THEN 
        @SecondsSince1970 - (@SecondsSince1970 % 3600) END )
    AND
    CASE 
    WHEN @SecondsSince1970 % 3600 < 1801 THEN 
        @SecondsSince1970 - (@SecondsSince1970 % 3600)
     WHEN @SecondsSince1970 % 3600 > 1800 THEN 
        @SecondsSince1970 - (@SecondsSince1970 % 3600) + 1800 END 
and workgroupid >0
group by workgroupid

I don't know if this will help, but it is worth a try. Good luck.


-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
NoCoolHandle,

YESSS!!! The Golden Morsel was adding (index=ix_enddate).
Code:
FROM   dbo.interactiontrack [COLOR=red](index=ix_enddate)[/color red]
Even the larger query that incorporates this one three different times in sub-queries runs in less than a second.

A sincere "Thank-you" to you and all the other people who make this forum so great.

I also very much appreciate your discussion of the indexes and their implications on query execution speed. I will study it and incorporate your suggestions and observations in future queries.

WinN
 
Win..

1 big word of caution.
These "optimizer hints" can build performance bottle necks.

Data distribution can change over time. Rule of thumb (and this changes as your indexes get deeper, but 1 row of data = 3 units of IO..

Bringing back a reasonable number of rows can KILL performance.

Rob

PS if this has been helpfull i would love a star :)

 
George,

Thanks for your observations and suggesed code changes. I tried your version without the "(index=ix_enddate)" in the FROM statement that NoCoolHandle suggested and it still took the same time to run as the old code.

I think that (theoretically anyway) in this case it only needs to read the statements one time to establish the boundaries for the BETWEEN statement. One thing I do note though is how much cleaner your version of the code looks and it will simplify maintenance and documentation of these modules.

Thanks again for your help and being a contributing member of this forum.

WinN
 
I mentioned this because I did the same thing (but with a different formula) in one of my queries a couple months ago and it made a tremendous improvement in execution time.

I'm glad the index hint solved your problem.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top