Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Which query format is preferable? 1

Status
Not open for further replies.

jbradley

Programmer
Sep 7, 2001
248
US
As I mentioned in a previous post I am no database programmer, I just got thrown in at the deep end. As I was working on this project a coworker, who is an AS/400 guru, suggested a query structure that eliminates the need for joins and is supposed to result in quicker, less resource-intensive queries. Below are before and after examples. Both complete in 26 seconds, returning just under 9,000 rows from two 5 million + row tables.

BTW, the MAX(h.DateChanged) is intended to get me just the most recent record for each AccountID. Is how I did that correct for what I want?

With Join
Code:
SELECT h.AccountID, 
	   MAX(h.DateChanged) AS DateChanged, 
	   m.[status] AS CURRENTSTATUS, 
	   h.OldStatus, 
	   h.NewStatus, 
	   GETDATE() AS RECORDDATE, 
FROM dbo.StatusHistory h
	INNER JOIN dbo.[master] m
		ON h.AccountID = m.[number] 
WHERE (h.OldStatus <> h.NewStatus) 
	AND (h.OldStatus <> m.[status]) 
	AND (h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') 
		OR h.NewStatus = 'NSF') 
GROUP BY h.AccountID, 
		 h.OldStatus, 
		 h.NewStatus, 
		 h.DateChanged, 
		 m.[status] 
ORDER BY h.DateChanged DESC
Without Join
Code:
SELECT h.AccountID, 
	   MAX(h.DateChanged) AS DateChanged, 
	   m.[status] AS CURRENTSTATUS, 
	   h.OldStatus, 
	   h.NewStatus, 
	   GETDATE() AS RECORDDATE, 
FROM dbo.StatusHistory h, dbo.[master] m 
WHERE h.AccountID = m.[number] 
	AND (h.OldStatus <> h.NewStatus) 
	AND (h.OldStatus <> m.[status]) 
	AND (h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') 
		OR h.NewStatus = 'NSF') 
GROUP BY h.AccountID, 
		 h.OldStatus, 
		 h.NewStatus, 
		 h.DateChanged, 
		 m.[status] 
ORDER BY h.DateChanged DESC
 
change this --
Code:
  FROM dbo.StatusHistory h    
INNER 
  JOIN dbo.[master] m   
    ON h.AccountID = m.[number] 
 WHERE (h.OldStatus <> h.NewStatus) 
   AND (h.OldStatus <> m.[status])  
   AND (h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') 
        OR h.NewStatus = 'NSF')
to this --
Code:
  FROM dbo.StatusHistory AS h    
INNER 
  JOIN dbo.[master] AS m   
    ON m.[number] = h.AccountID
   AND m.[status]) <> h.OldStatus  
 WHERE ( h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') 
      OR h.NewStatus = 'NSF') 
  AND h.OldStatus <> h.NewStatus)

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
Makros, thank you for the reply.

Rudy, thank you too.

I was able to determine that my attempt to return just the newest status change for each account using MAX(DateChanged) doesn't work. Does anyoone have any suggestions?
 
Both queries represent real joins.

a query structure that eliminates the need for joins and is supposed to result in quicker, less resource-intensive queries

I don't know how things are in the AS/400 environment, but in SQL Server, both queries represent a join. Specifically an inner join. When you run a query against a database, there are a lot of things that happen behind the scenes. Before SQL Server can run the query, it must first decide on the BEST way to run it. SQL Server will parse the query and create an execution plan for it. It then runs the execution plan to return your results. Execution plans can be simple or extremely complicated based on the query you are executing.

In this particular example, both queries will generate an identical execution plan and both will require the same resources, and will perform in the same amount of time (+ or - server loads, blocking, etc....).

I would encourage you to "see for yourself". If you load both queries in to a query window, press CTRL-M, and then run the query, you will see a new tab in SQL Server Management Studio. The "Execution Plan" tab will show you a lot of information. Since there are 2 queries, it will show you the amount of time required to run each of them (expressed as a percentage). In this case, since there are 2 queries, each will say 50%. The execution plan tab will also show you the actual execution plan that is executed. Both queries (in this case) will produce identical execution plans.

One query is no better than the other in terms of resources or performance.

Most programmers (that are any good) will tell you that the first one is better, and I happen to agree.

The first query is better not because it performs any different, but because it is similar to queries you would write for Left/Right/Full and Cross joins. In this case, writing the query as shown in the first code snippet will be more consistent with other code that is written.

BTW, the MAX(h.DateChanged) is intended to get me just the most recent record for each AccountID. Is how I did that correct for what I want?

No. Max(h.DateChanged) will certainly return the max value, but it won't necessarily correspond to the "most recent record". Specifically... it's possible that (based on your group by), there could be multiple rows that are returned.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George,

Thank you for that concise explanation, both about the reasons why one is better than the other and about my use of MAX.

Do you have any suggestions as to how to get the results I'm looking for?
 
What version of SQL Server are you using?
Does DateChanged include Time?


-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George,

It's SQL Server 10.50.1600 Standard Edition (64-bit).
DateChanged includes the time in hours and minutes but no seconds.

Brad
 
It's difficult writing a query when you cannot test it. The query I show below *may* not be right because it makes some assumptions about your data. However, if I'm right, the query will return the correct results and will probably execute faster too.

Please let me know.

Code:
SELECT h.AccountID, 
       NewestHistory.DateChanged, 
       m.[status] AS CURRENTSTATUS, 
       h.OldStatus, 
       h.NewStatus, 
       GETDATE() AS RECORDDATE
FROM   dbo.StatusHistory h
       INNER JOIN dbo.[master] m
         ON h.AccountID = m.[number] 
         And h.OldStatus <> m.[status]
       INNER JOIN (
         SELECT h.AccountID, 
                h.OldStatus,
                MAX(h.DateChanged) AS DateChanged
         FROM   dbo.StatusHistory h
         Where  h.OldStatus <> h.NewStatus
                AND (h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') 
                    OR h.NewStatus = 'NSF') 
         Group By h.AccountId, h.OldStatus
         ) As NewestHistory
         On h.AccountId = NewestHistory.AccountId
         And h.DateChanged = NewestHistory.DateChanged
         And h.OldStatus = NewestHistory.OldStatus
ORDER BY h.DateChanged DESC

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George,

Wow! It took me a while to start getting a handle on what your query does and I still haven't grasped it completely. I think I get that the sub-query creates a (recordset?), where I would have used a temp table (if I had the knowledge to do it at all, which I don't). The stacked (or would 'nested' be a better term?) joins have me scratching my head but I hope to figure out what is going on eventually. I won't presume on your good nature by asking you to give up any more of your time to explain it step by step - I've ordered a SQL book from Amazon already. [smile]
Thanks again, and another star to you.

Brad
 
I don't mind explaining it, but before I do, can you answer a few questions...

1. Does it return the correct results?
2. How does the performance of this query compare to the original?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
George,

I'm in the process of vaildating that the records returned are truly the latest for each account but I can say that there are no multiple returns for any one account, which was a problem with my query. The performance is identical to the original query if the sort is on the DateChanged field, which it turns out is not indexed. If I sort on AccountID the elapsed time drops from 26 seconds down to 15.
 
Brad,

Have you got the authority/permissions to change or add new indexes to the tables?

If so, can you run the following queries and post the results:

Code:
sp_helpindex [master]
sp_helpindex StatusHistory

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Markros,

Thanks for the links. I will check them out.

George,

I can add or change indexes but it will have to wait until tomorrow at this point.

Thank you both for your help. This is my idea of a good time.

Brad
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top