Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Why the optimizer chooses hash join over index join 1

Status
Not open for further replies.

ddiamond

Programmer
Apr 22, 2005
918
US
I'm running oracle 8.0. I often have to use the hint /*+ rule */ to force the optimizer to use index joins instead of hash joins. I know that hash joins are sometimes faster than index joins, but 9 times out of 10, when the stat based optimizer chooses a hash join, it is a bad choice. I know this sounds like my stats are not up to date, but we analyze statistics once a week. So does anyone know why the rule based optimizer often does a better job than the stat based one for my queries?
 
How do you gather statistics? Do you analyze indexes? Besides RBO depends not only on statistics but also on optimizer_index_cost_adj and optimizer_index_caching (maybe some other less important) values.

Regards, Dima
 
I'm using the analyze_schema command:

execute dbms_utility.analyze_schema <schema>, 'COMPUTE';
 
It depends what sort of query you are doing. By default, Oracle will try to optimize for the best throughput over the whole query. Hash joins will normally give the best performance overall but there is an extra initial overhead in that Oracle has to build the hash tables. Therefore, the performance of getting the first few rows is likely to be quite bad.

If you are timing you queries by looking at how long it takes to bring back the first screenful of data in TOAD, then hash joins probably will give worse performance. But if you compare your queries using the total time to run the query to fetch every row, you may find hash joins give better performance.

You can control this behaviour using the ALL_ROWS and FIRST_ROWS hints. Using these might be a better option than using a RULE hint.

 
I'll try the all_rows hint. I am measuring the time it takes to retrieve all of the rows. In many cases the time difference between a hash join and an index join is very significant. For example, one of the queries I am tuning takes 5 minutes with the hash join and 0.3 seconds with the index join. Eventhough my source tables are quite large, the total number of records my query returns is only 58, so it doesn't sound like a hash join makes a lot of sense, but that is what the optimizer chooses unless I use the RULE hint. I'll see if the ALL_ROWS hint accomplishes the same thing.
 
Ok, I tried both ALL_ROWS and FIRST_ROWS but they didn't make a different. The optimizer still chose to do 4 hash joins. Ironically, the RULE hint still seems to give me the best results eventhough it is not recommended.
 
You'd better test with FIRST_ROWS :)
Actually ALL_ROWS prefers hash joins over nested loops. I don't remember, but does DBMS_STATS exist in 8.0? Try it instead of dbms_stats. And how about that parameters? Funny thing, but their default values are not suitable for most OLTP systems.

Regards, Dima
 
Thanks sem. I tried to execute dbms_stats.gather_schema_stats and got the following error message:

ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00201: identifier 'DBMS_STATS.GATHER_SCHEMA_STATS' must be declared
ORA-06550: line 1, column 7:
PL/SQL: Statement ignored

Does this mean that oracle 8.0 does not support this?
 
I tried it with my sys admin account, but got the same error message. So I assume access is not the problem.
 
Sorry sem, I didn't know what you meant by optimizer_index_cost_adj. I just googled it and ran:

alter session set optimizer_index_cost_adj = 20;

That did the trick. Now the optimizer uses index joins.
 
I just looked up optimizer_index_caching.

The default value for the optimizer_index_caching parameter is 0, which gives the highest preference to hash joins and sort merge joins. Resetting this parameter can be very dangerous if you are not using stored outlines because it could change the execution plans for thousands of SQL statements. Also, because the cost-based optimizer will generally only invoke sort merge joins when there are no indexes on the joined tables, this parameter has the most effect on the invocation of hash joins.

I'm not sure I fully understand what this means. If I use and alter session to set it, won't it only effect my session, or will it affect everyone's session?
 
Yes, when you change it by alter session then it obviously affects only current session. To make changes global and permanent you should set it in pfile and restart your instance.

Regards, Dima
 
I'm starting to gain more confidence in optimizer_index_cost_adj = 20. I was afraid that this might cause the optimizer to choose an index in cases where a hash join really is better, but so far the optimizer still correctly chooses the hash join when the hash is the better option. So the default optimizer_index_cost_adj = 100 really does seem like a bad default value.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top