Compare two queries please 3

cjburkha · Aug 25, 2004

I have two queries that return the same dataset.

Here is one:

Code:

SELECT DISTINCT pg_content.pg_index, pg_content.pg_id, pg_content.elem_id, pg_content.stat_weight, pg_content.text_string
FROM pg_content, pg_master
WHERE (((pg_content.pg_id)="busLon") AND ((pg_content.end_date)>=#9/25/2004#) AND ((pg_content.start_date)<=#9/25/2004#)) OR (((pg_content.pg_id)=[pg_master].[sect_id]) AND ((pg_content.end_date)>=#9/25/2004#) AND ((pg_content.start_date)<=#9/25/2004#) AND ((pg_master.pg_id)="buslon"));

Here is the other:

Code:

SELECT pg_content.pg_index, pg_content.pg_id, pg_content.elem_id, pg_content.stat_weight, pg_content.text_string
FROM pg_master RIGHT JOIN pg_content ON pg_master.sect_id = pg_content.pg_id
WHERE (((pg_content.pg_id)="busLon") AND ((pg_content.end_date)>=#9/25/2004#) AND ((pg_content.start_date)<=#9/25/2004#)) OR (((pg_content.end_date)>=#9/25/2004#) AND ((pg_content.start_date)<=#9/25/2004#) AND ((pg_master.pg_id)="busLon"));

I put these queries onto and ASP.NET page, and the SELECT DISTINCT seems to run a bit faster.

Could anyone tell me which of these two methods is better, and why? Or, since this has probably been discussed before, point out a link explaining the difference between the two?

Thanks for your help,

CJB

lespaul · Aug 25, 2004

I don't know if you just got lucky that the results match or if it's because of the distinct in the first query, but here's what I can tell you.

The first query is a "cross join" query. If you have two tables:

Code:

tableA
a1     a2
10     u
20     v
30     w
40     x
50     y
60     z

tableB
b1     b2
10     p
20     q

if your query is:

SELECT * FROM tableA, tableB

your results will be:

Code:

a1     a2     b1     b2
10     u      10      p
20     v      10      p
30     w      10      p
40     x      10      p
50     y      10      p
60     z      10      p
10     u      20      q
20     v      20      q
30     w      20      q
40     x      20      q
50     y      20      q
60     z      20      q

the columns from both tables are combined to produce a result set containing ALL POSSIBLE combinations.

With a RIGHT join (like in your second example), what you are saying in your query is:

return all the rows from the right side of the join (pg_content) and only the rows from the left side (pg_master) that match.

Here's a link you may find helpful 'Understanding SQL Joins'.

HTH

leslie

cjburkha · Aug 25, 2004

Thanks for you response. I think it is because of the DISTINCT key word in the query. If I don't include that word, I do in fact get all possible combinations, and much more data than I want. But if I use distinct, I get what I am looking for, all the time.

The crux of my question, I guess, is what is the difference between a right join, and a DISTINCT statment?

Thanks,

CJB

PHV · Aug 25, 2004

Have you tried something like this ?
SELECT pg_content.pg_index, pg_content.pg_id, pg_content.elem_id, pg_content.stat_weight, pg_content.text_string
FROM pg_content
WHERE pg_content.pg_id="busLon" AND pg_content.end_date>=#9/25/2004# AND pg_content.start_date<=#9/25/2004#
UNION
SELECT pg_content.pg_index, pg_content.pg_id, pg_content.elem_id, pg_content.stat_weight, pg_content.text_string
FROM pg_content INNER JOIN pg_master ON pg_content.pg_id=pg_master.sect_id
WHERE pg_content.end_date>=#9/25/2004# AND pg_content.start_date<=#9/25/2004# AND pg_master.pg_id="busLon"
;

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

JonFer · Aug 25, 2004

After removing the extraneous parentheses, here are your From and Where clauses:

Distinct:

Code:

FROM pg_content, pg_master
WHERE 
(pg_content.pg_id="busLon" AND pg_content.end_date>=#9/25/2004# AND pg_content.start_date<=#9/25/2004#) 
OR 
(pg_content.pg_id=[pg_master].[sect_id] AND pg_content.end_date>=#9/25/2004# AND pg_content.start_date<=#9/25/2004# AND pg_master.pg_id="buslon");

Right Join:

Code:

FROM pg_master RIGHT JOIN pg_content 
    ON pg_master.sect_id = pg_content.pg_id

WHERE (pg_content.pg_id="busLon" AND pg_content.end_date>=#9/25/2004# AND pg_content.start_date<=#9/25/2004#) 
OR 
(pg_content.end_date>=#9/25/2004# AND pg_content.start_date<=#9/25/2004# AND pg_master.pg_id="busLon");

The Distinct query has a bad WHERE clause which is causing duplicates. It only needs to be the following, which should appear on one criteria line in the query design window:

WHERE pg_content.pg_id=[pg_master].[sect_id] AND pg_content.end_date>=#9/25/2004# AND pg_content.start_date<=#9/25/2004# AND pg_master.pg_id="buslon"

Use a RIGHT JOIN (or LEFT JOIN) if you want to see records in one table that don't have corresponding records in another table based on your join conditions. You should read more about this. Note: Adding conditions to the "subordinate" table in a LEFT/RIGHT JOIN (as you do above) turns the query into a regular INNER JOIN (only shows records with matching join fields in both tables) so do an INNER JOIN as it is more efficient. If you want to do the LEFT/RIGHT JOIN against a subset of the subordinate table, you have to save a query against it first and then bring the query into the LEFT/RIGHT JOIN to get the correct results (unless you have Access XP which has an additional method).

Use DISTINCT if you are looking at a set of columns in a table and want to see the unique combinations of the values in the fields. DO NOT use it to remove duplicates generated from incorrect join conditions - fix your join conditions instead.

JonFer · Aug 25, 2004

cj - Sorry, I didn't look close enough. The criteria you put on the pg_master field doesn't make the RIGHT JOIN query act like an INNER JOIN. You have the pg_master criterion following an OR and the first set of criteria still gives you the pg_content records without corresponding pg_master records.

My "correction" for the WHERE in the DISTINCT query won't return all of the records you want so never mind that.

Bottom line, you should use the RIGHT JOIN for your query but I don't think you need the second set of criteria. Try it with just this WHERE and see what you get:

WHERE pg_content.pg_id="busLon" AND pg_content.end_date>=#9/25/2004# AND pg_content.start_date<=#9/25/2004#

cjburkha · Aug 26, 2004

Thanks to everyone for your helpful posts.

Here is where I am at. I decided to use lespauls sample data to make everything a little clearer. Now I have, thanks to PHV, 3 queries.

Distinct

Code:

SELECT  DISTINCT TableA.a1, TableA.a2
FROM TableA, TableB
WHERE (TableA.a1>30) OR (TableB.b1 < 30 AND TableA.a1=TableB.b1);

Right Join

Code:

SELECT TableA.a1, TableA.a2
FROM TableB RIGHT JOIN TableA ON TableB.B1 = TableA.a1
WHERE (TableA.a1>30) OR (TableB.B1<30);

Union

Code:

SELECT TableA.a1, TableA.a2
FROM TableA
WHERE TableA.a1 > 30
UNION SELECT TableA.a1, TableA.a2
FROM TableA INNER JOIN TableB ON TableA.a1=TableB.b1
WHERE Tableb.b1 < 30;

As you can see, these three queries will return the same data set:

Code:

a1	a2
10	u
20	v
40	x
50	y
60	z

My question is, are any of these queries better than any of the others? Are any technically better, or simply stylistic? Again, I put these 3 queries into an ASP page and ran some tests, but I am starting to have doubts about my testing method. By running the queries interspersed with other queries, I get different results.

Also, this is kind of an academic question, comparing 3 different queries, but I really do need to do this in real life, I need to select all records from a table according to a certain criteria, unless a record is in another master table. My real life application works, I'm just trying to tweak it.

Thanks again for everyones help

CJB

PHV · Aug 26, 2004

For optimisation create an index on TableA.a1 and another one on TableB.b1

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

JonFer · Aug 26, 2004

While DISTINCT works, it is not very efficient. The criteria of just "TableA.a1<30" when you have two tables generates, as Leslie said, a cartesian product with every row in TableA that has a1<30 repeated for every row in TableB. The repeated rows are removed by using DISTINCT. You also get the rows where a1=b1 and b1<30 without duplication (assuming they are unique in the tables).

The UNION has to read one of the tables twice but doesn't product any duplication.

The RIGHT JOIN should be the most efficient because it reads through the tables once and doesn't generate duplicates.

cjburkha · Aug 26, 2004

Thank you all very much, esp JonFer, that is exactly what I was looking for

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Compare two queries please 3

cjburkha

Programmer

lespaul

Programmer

cjburkha

Programmer

PHV

MIS

JonFer

Programmer

JonFer

Programmer

cjburkha

Programmer

PHV

MIS

JonFer

Programmer

cjburkha

Programmer

Similar threads

Part and Inventory Search

Sponsor