First, let's take a looks at the following code. It probably represents the 'normal' situation.
Code:
Declare @Main Table(ID Int, StatusId Int)
Insert Into @Main Values(1,1)
Insert Into @Main Values(2,1)
Insert Into @Main Values(3,1)
Insert Into @Main Values(4,2)
Declare @Status Table(Id Int, Status VarChar(20))
Insert Into @Status Values(1,'Red')
Insert Into @Status Values(2,'Yellow')
Insert Into @Status Values(3,'Green')
Select ID,
(Select Status From @Status Where Id = StatusId)
From @Main
Select Main.Id, Status.Status
From @Main As Main
Inner Join @Status As Status
On Main.StatusId = Status.Id
With the data presented in the table, both queries return the same output.
Now, let's change some data. You already mentioned about the NULLs and using Left join instead. But let's see what happens.
If we leave the queries alone and just change the data:
Code:
Declare @Main Table(ID Int, StatusId Int)
Insert Into @Main Values(1,1)
Insert Into @Main Values(2,1)
Insert Into @Main Values(3,[!]NULL[/!])
Insert Into @Main Values(4,2)
Declare @Status Table(Id Int, Status VarChar(20))
Insert Into @Status Values(1,'Red')
Insert Into @Status Values(2,'Yellow')
Insert Into @Status Values(3,'Green')
Select ID,
(Select Status From @Status Where Id = StatusId)
From @Main
Select Main.Id, Status.Status
From @Main As Main
Inner Join @Status As Status
On Main.StatusId = Status.Id
The first query returns 4 rows but the second query only returns 3 rows. Changing the second query to a left join instead...
Code:
Declare @Main Table(ID Int, StatusId Int)
Insert Into @Main Values(1,1)
Insert Into @Main Values(2,1)
Insert Into @Main Values(3,[!]NULL[/!])
Insert Into @Main Values(4,2)
Declare @Status Table(Id Int, Status VarChar(20))
Insert Into @Status Values(1,'Red')
Insert Into @Status Values(2,'Yellow')
Insert Into @Status Values(3,'Green')
Select ID,
(Select Status From @Status Where Id = StatusId)
From @Main
Select Main.Id, Status.Status
From @Main As Main
[!]Left[/!] Join @Status As Status
On Main.StatusId = Status.Id
Now we are back to returning the same data.
Now let's change the data again...
Code:
Declare @Main Table(ID Int, StatusId Int)
Insert Into @Main Values(1,1)
Insert Into @Main Values(2,1)
Insert Into @Main Values(3,NULL)
Insert Into @Main Values(4,2)
Declare @Status Table(Id Int, Status VarChar(20))
Insert Into @Status Values(1,'Red')
[!]Insert Into @Status Values(2,'Yellow')
Insert Into @Status Values(2,'Purple')[/!]
Insert Into @Status Values(3,'Green')
Select ID,
(Select Status From @Status Where Id = StatusId)
From @Main
Select Main.Id, Status.Status
From @Main As Main
Left Join @Status As Status
On Main.StatusId = Status.Id
Notice that we have 2 rows in the status table with an id = 2. In this case, the subquery method errors with "Subquery returned more than 1 value." The second query returns 5 rows because there are multiple matches in the status table.
We can fix the subquery method by adding Top 1 to it, like this:
Code:
Declare @Main Table(ID Int, StatusId Int)
Insert Into @Main Values(1,1)
Insert Into @Main Values(2,1)
Insert Into @Main Values(3,NULL)
Insert Into @Main Values(4,2)
Declare @Status Table(Id Int, Status VarChar(20))
Insert Into @Status Values(1,'Red')
Insert Into @Status Values(2,'Yellow')
Insert Into @Status Values(2,'Purple')
Insert Into @Status Values(3,'Green')
Select ID,
(Select [!]Top 1[/!] Status From @Status Where Id = StatusId)
From @Main
Select Main.Id, Status.Status
From @Main As Main
Left Join @Status As Status
On Main.StatusId = Status.Id
The Top 1 part prevents the error, but there is still only 4 rows in the first query and 5 rows in the second. And this is exactly the point I was trying to make.
Joins will return ALL matches, but subqueries can ONLY return 1 value. Sure... it can be null if there are no matches, but it cannot return more than 1 match.
Also realize that this is a simple example because the subquery only involves 2 tables (the main table and a lookup table). Often times you will see more complicated queries as subqueries. The added complication makes it even more difficult to notice data discrepancies.
Of course, under the correct circumstances (based on data) the results can be the same, and you may even be able to rely upon it. To rely upon the results, the column in the lookup table that you are using should be unique, either enforced through a primary key or a unique constraint.
Make sense?
-George
Microsoft SQL Server MVP
My Blogs
SQLCop
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom