Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Oracle 9i Performance on "Order By" clause

Status
Not open for further replies.

ragolia

Programmer
Sep 11, 2003
13
US
I have what I feel is a really simple question about Oracle 9i and the performance of the "Order By" clause. I pose this question because my company DBA's are being rather dogmatic about their response. The situation that I have is this - I have a simple query with an order_by clause on it.

select * from table_a
where fld_a > x
and fld_a <= y
order by fld_a;

table_a has &quot;N&quot; number of columns
fld_a is the primary_key - it is a single column primary key.

My predicament is this. The dba's in my organization are dogmatic about not using the 'order by' clause. They state that an external sort of the resultset would be more efficient than using the 'order by' clause. My question is this - why? I am becoming suspect of their answer - mostly because they won't tell me why.

I could understand their answer if the 'order by' clause was not on the primary key field. I am questioning about this specific case.

It shouldn't make any difference but this query will be running on Oracle 9i/HP-UX 11i...

Rick Agolia
 
Because your query is doing a SELECT *, it will be doing two full reads - a scattered read to read the indexed column range used in your WHERE predicate to lookup the rowid, then it would do a sequential read of a full table access by rowid to return all of your data. So it does two complete reads.

if your query was select fld_a from table_a
where fld_a > x
and fld_a <= y
order by fld_a; the order could be eliminated, providing the index returns the values in fld_a in ascending order. This would do a single read of the index. Of course, this is not pratical, since you probably need to access more columns than fld_a.

Depending on the magnitude of data, the DBA's probably want you to do your sort order outside of the database engine, (since it is probably affecting some metric that they report and are evaluated on). (if you really want to freak them out, add a distinct to it!) Have you used explain plans?
 
Ok so let me try to 'draw a picture' here. The first read is against the index returning the row-ids of the rows required to satisfy the query. The second pass takes the results of the index check against the table itself returning the necessary rows from the table. Sorry if I'm over-simplifying but is that a summary of the activity involved.

Two things I did not mention before in response to your comments...
1. The table could have millions of rows.
2. The resultset could be thousands of rows - not hundreds of thousands but thousands would not be totally unreasonable.

What you appear to be saying is that there just is no assurance of the order of the data returned from the query. Would the 'explain plan' show me this?

Is there any default behavior for Oracle in regards to the data returned from a query? For example, if I simplify the query to &quot;Select * from table_a&quot; is the data to be returned in a specific order?

Like I said, I'm not a dba, but I'm also not afraid to ask 'dumb' questions. The way I see it - I won't learn if I don't ask - or at least ask where to find out the info.

 
Rick,

Any RULE in a business (whether the rule is an IT rule or some other business rule) had better be reconcilable back to the business's BOTTOM LINE. That means, by following the RULE, we either INCREASE REVENUE or we DECREASE EXPENSES.

By some measure, your DBAs believe that the COST (i.e., EXPENSE) of doing an Oracle &quot;Order By&quot; is somehow unacceptable when compared to the COST of an &quot;external sort&quot;.

I would be willing to abide by their dogma if they could show me the TOTAL COSTS of each of the &quot;order-by versus external-sort&quot; alternatives, and some alternative is less expensive.

BUT, as part of their total-costs justification, I'd better also see the salary costs of your time (or someone else's time) to do all the extra &quot;fiddling&quot; that would need to occur with a solution other than &quot;order by&quot;. I would want to see their costs for CPU, disk I/O (including the writing/passing of the data from Oracle to the external sort operation, the external sort's I/O processing, et cetera), and perhaps most important, the HUMAN-INTERVENTION TIME to effect a non-Oracle solution. Remember that salary costs of any data-processing operation usually surpass the costs of hardware and software usage.

Until someone can justify (from a financial perspective) what appears to be a silly RULE, in the absence of financial justification, I'd fight it.

I, personally, would be extremely interested in your findings. Please post your results if it would not compromise any proprietary information.

Dave

 
Review the white paper by mark gurry at orapub.com (you have to create a free user account.)


Ask the last time that the table was analyzed. Statistics must be recent (i.e. in comparison to the amount of change that takes place in the table.)

Unless the table was built as an index organized table (IOT), there is no guarentee of sort order without an order by or the use of an index. using DISTINCT or UNIQUE will cause the records to be sorted, which implies order. If you are after a subset of data from the table and you are accessing (and the numbers vary here) approximately 25% or less of the records, then create an index which will match the columns in your select statement and the columns used in your where predicate.
(Do you really need to do a 'select *' ? If not, narrow that down to the exact columns you want. Create an index with those columns and any additional columns that you use in your where predicate. - Then oracle may use just the index without accessing the table at all. Then the DBA's will blow kisses to you! )

Example tableX is created with columns c1,c2,c3,c4,c5,c6. It has one index on c1 and c2.

if you do a select * from tableX, then it does a full table scan. If you do a select * from tableX where c1 > 'ME' and c1 < 'YOU', then it will do a range scan of the index and a full table access with the rowids that it finds.

If you really want to only 'select c1,c2,c3,c4,c6 from tableX where c1 > 'ME' and c1 < 'YOU', then have them create an index on c1,c2,c3,c4,c6. Then the query will do an index lookup and never touch the table. There are limitations to this. If you are talking large amounts of data, it may not be cost effective to index that large a number of columns. (Posting the column defintions and indexes, if possible, with your sql statement would probably be helpful.)

Ask the DBA to do a sql_trace of your session and do a tkprof on the trace file output for you. That will show you what is being done by oracle.
 
A little application description here...
There are about 160 tables that I have to apply this basic query to. In all cases, all of the columns are required to be part of the resultset. Some of these tables have as few as 10 columns. Others have as many as 200 columns. The database supports a high-volume OLTP environment (millions of updates/day across all tables) along with relatively high-volume batch jobs.

The query will run multiple times during the day as part of a batch job. At any given time,the resultset should be something less than 1% of the table returned. The primary_key is based upon an Oracle sequence which is incremented by one (via seqnbr.nextval) when the row is inserted into the table.

The 'business' logic forcing the dogmatic approach by the dba's is relatively simple. Because this is a high-volume OLTP database, the effort is being made to pull as much of the overhead effort away from Oracle as possible and not risk missing requested timings on the on-line transactions. I can logically understand this desire.

I don't know if it changes the discussion dramatically but a few of these queries are slightly different...

select * from table_b
where col_b > min_value
and col_b <= max_value
order by col_a;

- col_b is not the primary_key but it is non-unique index. If I assume a uniform distribution of the data, there will be approximately 1500 discrete values. No single query should request more than 3 or 4 of the discrete values at any given time.
- col_a is the primary key.

 
&quot;If I simplify the query to &quot;Select * from table_a&quot; is the data to be returned in a specific order?&quot;

No. That's the reason for having an ORDER BY clause - you can't reliably predict the order in which rows will be selected.

Reardless of whether you &quot;SELECT *&quot; or &quot;SELECT a,b,c&quot;, an ORDER BY clause will add a performance hit - as the resultset retrieved from the database has to be sorted into order. The real questions are how significant is this performance hit?, and how sensible is it to push out the sorting task to calling programs?

It would be pretty easy to measure how much work is involved in the database in serving sorted and unsorted versions of the same query - my guess is that there's not a huge difference. I'm pretty sure that the statement &quot;that an external sort of the resultset would be more efficient than using the 'order by' clause&quot; is nonsense - sorting data is meat and drink to the Oracle database, it would be hard to better its performance. There's also a resource issue in writing and maintaining all those sort routines.

It sounds to me like a culture problem - the DBAs are so focussed on shaving milliseconds off the database performance (which, of course, is their job) that they lose track of the big picture - all the work (and cost) involved in sparing the database from doing the job.

-- Chris Hunt
 
If fld_a is indexed (I suppose it is), extra sorting may not be performed at all: scannig table by index without parallelizing guarantees the order of records. You should only force using index and with order by clause or without it - the plan (and the cost!) will be the same.

Regards, Dima
 
Dima

Now I'm really confused. Prior to your jumping in, and I appreciate that you did, the consensus seemed to be that the 'order by' caused at least some amount of overhead. If I read you correctly, with fld_a being the primary key and indexed, the use of the 'order by' will incur no additional cost to the query.

Would an explain plan show this? If I take this to the dba's, I would like to have the evidence to show that there would be no additional hit using the 'order by'.

Rick Agolia

 
Why do you ask it, just do it! I'm sure that your plans will be the same.

Regards, Dima
 
Here's some interesting observations from Tom Kyte with regards to primary key, contraints, and indexes.


The bottomline question is 'what can I do to get the best performance?' The answer... try all available combinations at your disposal. SQLTrace/tkprof and understanding explain plans will give you an idea of what the optimizer may do.
 
Thank you all very much. Now I have work to do and all this has been very helpful.
 
&quot;Now I'm really confused...&quot;

What Dima said (I think) is that if fld_A is a primary key,and you select only fld_A from the table, there's no overhead involved in the order by. Otherwise there is.

I think you said somewhere that you need to select all the columns, so this won't really help you.

-- Chris Hunt
 
Chris, in fact it doesn't matter whether FFS is performed or ordinary RANGE SCAN + ROWID, because the order of rows is provided by using index. So even for select * the result should be the same. It also doesn't matter whether fld_a is PK or just an ordinary field leading some B-Tree index.

Regards, Dima
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top