Which query format is preferable? 1

jbradley · May 2, 2011

As I mentioned in a previous post I am no database programmer, I just got thrown in at the deep end. As I was working on this project a coworker, who is an AS/400 guru, suggested a query structure that eliminates the need for joins and is supposed to result in quicker, less resource-intensive queries. Below are before and after examples. Both complete in 26 seconds, returning just under 9,000 rows from two 5 million + row tables.

BTW, the MAX(h.DateChanged) is intended to get me just the most recent record for each AccountID. Is how I did that correct for what I want?

With Join

Code:

SELECT h.AccountID, MAX(h.DateChanged) AS DateChanged, m.[status] AS CURRENTSTATUS, h.OldStatus, h.NewStatus, GETDATE() AS RECORDDATE, FROM dbo.StatusHistory h INNER JOIN dbo.[master] m ON h.AccountID = m.[number] WHERE (h.OldStatus <> h.NewStatus) AND (h.OldStatus <> m.[status]) AND (h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') OR h.NewStatus = 'NSF') GROUP BY h.AccountID, h.OldStatus, h.NewStatus, h.DateChanged, m.[status] ORDER BY h.DateChanged DESC

Without Join

Code:

SELECT h.AccountID, MAX(h.DateChanged) AS DateChanged, m.[status] AS CURRENTSTATUS, h.OldStatus, h.NewStatus, GETDATE() AS RECORDDATE, FROM dbo.StatusHistory h, dbo.[master] m WHERE h.AccountID = m.[number] AND (h.OldStatus <> h.NewStatus) AND (h.OldStatus <> m.[status]) AND (h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') OR h.NewStatus = 'NSF') GROUP BY h.AccountID, h.OldStatus, h.NewStatus, h.DateChanged, m.[status] ORDER BY h.DateChanged DESC

markros · May 2, 2011

The first query is better (as it uses real JOIN). The second is exactly the same as the first, just uses old style JOIN disguised as where clause.

PluralSight Learning Library

r937 · May 2, 2011

change this --

Code:

  FROM dbo.StatusHistory h    
INNER 
  JOIN dbo.[master] m   
    ON h.AccountID = m.[number] 
 WHERE (h.OldStatus <> h.NewStatus) 
   AND (h.OldStatus <> m.[status])  
   AND (h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') 
        OR h.NewStatus = 'NSF')

to this --

Code:

  FROM dbo.StatusHistory AS h    
INNER 
  JOIN dbo.[master] AS m   
    ON m.[number] = h.AccountID
   AND m.[status]) <> h.OldStatus  
 WHERE ( h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') 
      OR h.NewStatus = 'NSF') 
  AND h.OldStatus <> h.NewStatus)

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon

jbradley · May 2, 2011

Makros, thank you for the reply.

Rudy, thank you too.

I was able to determine that my attempt to return just the newest status change for each account using MAX(DateChanged) doesn't work. Does anyoone have any suggestions?

gmmastros · May 2, 2011

Both queries represent real joins.

a query structure that eliminates the need for joins and is supposed to result in quicker, less resource-intensive queries

I don't know how things are in the AS/400 environment, but in SQL Server, both queries represent a join. Specifically an inner join. When you run a query against a database, there are a lot of things that happen behind the scenes. Before SQL Server can run the query, it must first decide on the BEST way to run it. SQL Server will parse the query and create an execution plan for it. It then runs the execution plan to return your results. Execution plans can be simple or extremely complicated based on the query you are executing.

In this particular example, both queries will generate an identical execution plan and both will require the same resources, and will perform in the same amount of time (+ or - server loads, blocking, etc....).

I would encourage you to "see for yourself". If you load both queries in to a query window, press CTRL-M, and then run the query, you will see a new tab in SQL Server Management Studio. The "Execution Plan" tab will show you a lot of information. Since there are 2 queries, it will show you the amount of time required to run each of them (expressed as a percentage). In this case, since there are 2 queries, each will say 50%. The execution plan tab will also show you the actual execution plan that is executed. Both queries (in this case) will produce identical execution plans.

One query is no better than the other in terms of resources or performance.

Most programmers (that are any good) will tell you that the first one is better, and I happen to agree.

The first query is better not because it performs any different, but because it is similar to queries you would write for Left/Right/Full and Cross joins. In this case, writing the query as shown in the first code snippet will be more consistent with other code that is written.

BTW, the MAX(h.DateChanged) is intended to get me just the most recent record for each AccountID. Is how I did that correct for what I want?

No. Max(h.DateChanged) will certainly return the max value, but it won't necessarily correspond to the "most recent record". Specifically... it's possible that (based on your group by), there could be multiple rows that are returned.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

jbradley · May 2, 2011

George,

Thank you for that concise explanation, both about the reasons why one is better than the other and about my use of MAX.

Do you have any suggestions as to how to get the results I'm looking for?

gmmastros · May 2, 2011

What version of SQL Server are you using?
Does DateChanged include Time?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

jbradley · May 2, 2011

George,

It's SQL Server 10.50.1600 Standard Edition (64-bit).
DateChanged includes the time in hours and minutes but no seconds.

Brad

gmmastros · May 2, 2011

It's difficult writing a query when you cannot test it. The query I show below *may* not be right because it makes some assumptions about your data. However, if I'm right, the query will return the correct results and will probably execute faster too.

Please let me know.

Code:

SELECT h.AccountID, 
       NewestHistory.DateChanged, 
       m.[status] AS CURRENTSTATUS, 
       h.OldStatus, 
       h.NewStatus, 
       GETDATE() AS RECORDDATE
FROM   dbo.StatusHistory h
       INNER JOIN dbo.[master] m
         ON h.AccountID = m.[number] 
         And h.OldStatus <> m.[status]
       INNER JOIN (
         SELECT h.AccountID, 
                h.OldStatus,
                MAX(h.DateChanged) AS DateChanged
         FROM   dbo.StatusHistory h
         Where  h.OldStatus <> h.NewStatus
                AND (h.OldStatus IN ('CBD', 'CYC', 'DCC', 'FRD') 
                    OR h.NewStatus = 'NSF') 
         Group By h.AccountId, h.OldStatus
         ) As NewestHistory
         On h.AccountId = NewestHistory.AccountId
         And h.DateChanged = NewestHistory.DateChanged
         And h.OldStatus = NewestHistory.OldStatus
ORDER BY h.DateChanged DESC

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

jbradley · May 2, 2011

George,

Wow! It took me a while to start getting a handle on what your query does and I still haven't grasped it completely. I think I get that the sub-query creates a (recordset?), where I would have used a temp table (if I had the knowledge to do it at all, which I don't). The stacked (or would 'nested' be a better term?) joins have me scratching my head but I hope to figure out what is going on eventually. I won't presume on your good nature by asking you to give up any more of your time to explain it step by step - I've ordered a SQL book from Amazon already. [smile]

Thanks again, and another star to you.

Brad

gmmastros · May 2, 2011

I don't mind explaining it, but before I do, can you answer a few questions...

1. Does it return the correct results?
2. How does the performance of this query compare to the original?

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

jbradley · May 2, 2011

George,

I'm in the process of vaildating that the records returned are truly the latest for each account but I can say that there are no multiple returns for any one account, which was a problem with my query. The performance is identical to the original query if the sort is on the DateChanged field, which it turns out is not indexed. If I sort on AccountID the elapsed time drops from 26 seconds down to 15.

markros · May 2, 2011

Hi,

Just in case George's answer produced what you were looking for, I know of a two good blogs that already explain this and several other solutions for this common problem. Take a moment to read

Including an Aggregated Column's Related Values - this link has explanations
Including an Aggregated Column's Related Values - Part 2 - this just has test cases for the first blog

PluralSight Learning Library

gmmastros · May 2, 2011

Brad,

Have you got the authority/permissions to change or add new indexes to the tables?

If so, can you run the following queries and post the results:

Code:

sp_helpindex [master]
sp_helpindex StatusHistory

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom

jbradley · May 2, 2011

Markros,

Thanks for the links. I will check them out.

George,

I can add or change indexes but it will have to wait until tomorrow at this point.

Thank you both for your help. This is my idea of a good time.

Brad

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Which query format is preferable? 1

jbradley

Programmer

markros

Programmer

r937

Technical User

jbradley

Programmer

gmmastros

Programmer

jbradley

Programmer

gmmastros

Programmer

jbradley

Programmer

gmmastros

Programmer

jbradley

Programmer

gmmastros

Programmer

jbradley

Programmer

markros

Programmer

gmmastros

Programmer

jbradley

Programmer

Similar threads

Part and Inventory Search

Sponsor