Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Inner Joins 1

Status
Not open for further replies.

kav123

Programmer
Jan 12, 2005
210
GB
Can anyone tell how this works. Also if i change the order of these joins would that impact the resuls
select stuff
from A
inner join B on A.id = B.Aid
inner join C on A.id = C.Bid
inner join D on A.id = D.Cid
inner join E on D.id=E.id
inner join F on D.id=F.id
 
An Inner join returns all records where both tables match. So, if you have a table with 10,000 records, and a table with 5 records, and out of those, 4 records match, then you'll only return 4 records.

I've been told/taught that as far as the order goes, it's best to start with the smallest table if at all possible, if you're looking for the best performance.

Otherwise, if that isn't what you're looking for, please provide more detail as to what you are looking for.

--

"If to err is human, then I must be some kind of human!" -Me
 
Also if i change the order of these joins would that impact the results

Well... that depends on what you mean.

If you mean the order of the ON clause, then no.

Ex:

[tt][blue]
select stuff
from A
inner join B on A.id = B.Aid
[/blue][/tt]

is the same as
[tt][blue]
select stuff
from A
inner join B on [!]B.Aid = A.id[/!]
[/blue][/tt]

** Note, first query has A table on left in ON clause, second query has it on the right. This will produce the same results.

If you mean the order in which the tables are joined.... then the answer is, 'Maybe'. You cannot have a join where the ON clause references a table that isn't yet joined. That sounds confusing, so let me explain.

This should work:
[tt][blue]
1. select stuff
2. from A
3. inner join B on A.id = B.Aid
4. inner join C on A.id = C.Bid
5. inner join D on A.id = D.Cid
6. inner join E on D.id=E.id
7. inner join F on D.id=F.id
[/blue][/tt]

You can reorder certain joins without affecting the output. The join on row 7 depends on the 'D' table. So this join MUST appear after the join to the D table. However, lines 6 and 7 are not dependent on each other (they only depend on table D (line 5). So, you could swap the order of line 6 with line 7 without affecting the query whatsoever.

Understand that the ordering of the joins does not affect the results (as long as the dependencies are taken care of). This is true because you are using inner joins. If you had LEFT joins, then the answer would be completely different.

kjv1611 said:
I've been told/taught that as far as the order goes, it's best to start with the smallest table if at all possible, if you're looking for the best performance.

I don't believe this to be true. The SQL engine is smart enough to create an optimal execution plan (under most circumstances). In fact, this is what statistics is all about. SQL Server will examine the statistics for each part of the query and use/link to the tables in a manner that should produce the best results. This is why it's important to keep statistics up to date. Usually, people update statistics on a regular basis. Whenever you load a lot of data, you should update statistics and also on a regular basis. The interval for updating statistics is dependent on the usage pattern for your database. If you have a lot of activity, you may want to update statistics once a day (during your slowest period). For other DB's, once a week is enough. If you have a 'primarily' read-only DB, you may get away with monthly updates (or even not at all).

The ONLY true way to get an accurate picture of what's going on behind the scenes is to view the execution plan for the query.

Make sense?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
On a separate note, as long as we are giving lessons on joins:

The more stuff you can put in the JOIN section as opposed to the WHERE, the faster your query will run. WATCH OUT for attempting to do this on a LEFT or RIGHT JOIN. (Not fun).

If [blue]you have problems[/blue], I want [green]source code[/green] AND [green]error messages[/green], none of this [red]"there was an error crap"[/red]
 
whether or not you have additional conditions in the ON clause of an OUTER join is not a question of performance but a question of correctness

list all students and their grade in math --

query 1

select name
, grades.grade as math_grade
from students
left outer
join grades
on students.id
= grades.student_id
[red]where[/red] grades.subject = 'Math'

query 2

select name
, grades.grade as math_grade
from students
left outer
join grades
on students.id
= grades.student_id
[red]and[/red] grades.subject = 'Math'

query 1, despite being written as a LEFT OUTER JOIN, will return only math students, not all students as required

above examples are from LEFT OUTER JOIN with ON condition or WHERE condition?


r937.com | rudy.ca
 
==>The SQL engine is smart enough... In fact, this is what statistics is all about.. This is why it's important to keep statistics up to date. Usually, people update statistics on a regular basis.

So, just in case those statistics are not up to date, and you don't have any way of verifying they are, or getting them up to date, then best practice would be to assume they are not, just as an extra precaution. Or would that be a bad argument?

Just thinking about the possible scenerios here. Reason for me that I think this way, based on what you stated is that we have a very large complex process that runs at the end of every business day - well, it now starts just after midnight, I think... but not 100% sure anymore. Anyhow, the database where I pull most data from is a read-only database for everyone pretty much outside of the daily updates.

Is there a way to tell if statistics have been updated? I mean is there some sort of system table that would tell you that?

And how would querying views work into this equation? Since a view is sort of like a Query in Access (in a way, at least), then wouldn't it be refreshed each time it's accessed? Then, in that case, does the SQL engine automatically rerun statistics somehow?

That one point is pretty interesting to me -- Interesting to at least ponder and learn about.

--

"If to err is human, then I must be some kind of human!" -Me
 
Is there a way to tell if statistics have been updated?

For [!]SQL 2005[/!]
Code:
Declare @Temp Table(RowId Int Identity(1,1), Table_Name VarChar(200), Statistic_Name VarChar(200))
Insert Into @Temp(Table_Name, Statistic_Name)
Select Object_Name(Object_Id) As Table_Name, Name 
From   sys.indexes As Ind 
       Inner Join Information_Schema.Tables Tab
         On Object_Name(Ind.Object_Id) = Tab.Table_Name

Create 
Table   #Output(Name VarCHar(200), [Updated] DateTime, Rows Int, [Rows Sampled] Int, Steps Int, Density Float, [Average Key Length] Float, [String Index] VarChar(20))

Declare @Table VarChar(200)
Declare @Stat VarChar(200)
Declare @Row Int
Declare @Max Int

Select @Row = 1, @Max = Max(RowId) From @Temp

While @Row <= @Max
	Begin

		Select @Table = Table_Name, @Stat = Statistic_Name
		From   @Temp
		Where  RowId = @Row

		If @Table Is Not NULL And @Stat Is Not NULL
		  Insert Into #Output Exec ('DBCC show_statistics (''' + @Table+ ''',''' + @Stat + ''') WITH STAT_HEADER')
	
		Set @Row = @Row + 1
	End

Select * From #Output

Drop Table #Output

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks for the info, gmmastros. I'll take a look at it all soon, hopefully. Otherwise, I'll forget about yet another item of interest! [blush]

--

"If to err is human, then I must be some kind of human!" -Me
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top