Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Update Query Tuning Advice 2

Status
Not open for further replies.

djbjr

Programmer
Dec 7, 2001
106
US
I have a table with about 3 million rows.

I have a procedure that updates one column with current values. This takes about 18 minutes.

My problem is that I need to also set the values to 0 for rows that are not in the source.

I started by updated the table and setting all values to 0 and then running my update proc. This is taking an additional 20 minutes now:

Step 1 - update table set column = 0

Step 2 open cursor SELECT primary_key, column FROM source_table.

Step 3 LOOP update table set column = cursor_value where primary key = cursor primary key.

any suggestions on how I can speed this up? I have bitmap indexes on most of the columns on this table. (74 Columns)
(25 indexes)

Perhaps I should drop these indexes and rebuild them once the proc has completed?
 
Dropping the indexes may speed things up a bit...since the index needs to inserted to as well if the affected column is in one or more of the indexes...

Are you sure you need to use a loop / cursor to do the update? Straight SQL would be faster as well...


Bastien

Cat, the other other white meat
 
DJ,

I am open minded, Bastien, but I believe a PL/SQL block using cursors that update using ROWID would be faster than straight SQL.

You certainly do not want to use indexes. (That is why Bastien recommended dropping the indexes.) But with the following PL/SQL block, you process all rows of both tables, you do not waste the time/resources of dropping and recreating indexes (you still do not use indexes), you update only rows that truly needing updates and you update rows only once (you do not update all rows to '0' first). And, best of all, you use the fastest of all possible methods to update: by ROWID. (The following code presumes that SOURCE_TABLE rows without a matching MASTER row is an error; if it is not an error, then you/I can adjust the code accordingly.)

Section 1 -- Contents of test MASTER and test SOURCE_TABLE:
Code:
SQL> select * from master;

        ID COLUMN_TO_UPDATE
---------- ----------------
         1               13
         2               10
         3                0
         4               22
         5               15
        20                0
       100               24
SQL> select * from source_table;

        ID COLUMN_VALUE
---------- ------------
         1           22
         4           33
        12           12
       999          999

Section 2 -- Update code plus audit of results:
Code:
set serveroutput on
declare
	s	source_table%rowtype;
	cursor src is select * from source_table order by id;
	procedure get_source is
		begin
			fetch src into s;
			if src%notfound then
				close src;
			end if;
		end;
	procedure prt (x in varchar2) is
		begin
			dbms_output.put_line(x);
		end;
begin
	open src;
	dbms_output.enable (1000000);
	get_source;
	for m in (select rowid,master.* from master order by id) loop
		if m.id < s.id then
			if m.column_to_update <> 0 then
				update master set column_to_update = 0
                       where rowid = m.rowid;
				prt ('Master ID: ['||m.id||'] -- row updated
                       from ['||m.column_to_update||'] to [0].');
			else
				null; -- nothing happens if column_to_update already = 0.
				prt ('Master ID: ['||m.id||
                        '] -- row already [0]; no update necessary.');
			end if;
		elsif m.id = s.id then
			update master set column_to_update = s.column_value
                        where rowid = m.rowid;
			prt ('Master ID: ['||m.id||
                        '] -- row updated from ['||m.column_to_update||
				'] to column value: ['||s.column_value||'].');
			get_source;
		elsif m.id > s.id then
			prt ('*** Error: Source ID: ['||s.id||
                        '] has no matching Master ID. Ignoring. ');
			get_source;
		else
			prt ('*** Error: Master ID: ['||m.id||
                        '] and Source ID: ['||s.id||']. Should not ever happen.');
			get_source;
		end if;
	end loop;
	commit;
end;
/

Once you are confident of the results, you can remove the "prt" displays to significantly improve performance.

If you use this method, please advise us of the time improvement.

[santa]Mufasa
(aka Dave of Sandy, Utah, USA @ 01:27 (12May04) UTC (aka "GMT" and "Zulu"), 18:27 (11May04) Mountain Time)
 
Sorry, I forgot to paste the audit of results. Here there are:
Code:
Master ID: [1] -- row updated from [13] to column value: [22].
Master ID: [2] -- row updated from [10] to [0].
Master ID: [3] -- row already [0]; no update necessary.
Master ID: [4] -- row updated from [22] to column value: [33].
Master ID: [5] -- row updated from [15] to [0].
*** Error: Source ID: [12] has no matching Master ID. Ignoring.
Master ID: [100] -- row updated from [24] to [0].

Sorry,

[santa]Mufasa
(aka Dave of Sandy, Utah, USA @ 01:29 (12May04) UTC (aka "GMT" and "Zulu"), 18:29 (11May04) Mountain Time)
 
Dave, it's quite abnormally but I must disagree with you regarding pl/sql vs straight sql. I'm sure that the fastest way is to use pure sql, in case of using rowids the difference is minor though it still exists (for switching context). Can you prove the opposite?

Djbjr, can you provide your table structures? The reason I ask is that if both source and target table do have pimary key on column reffered as primary key you may complete your task in one shot:

update (select s.column source, t.column target from source s, target t where s.pk(+)=t.pk) set target = nvl(source, 0)

You should drop only indexes on target table containing column being updated, not all.

Regards, Dima
 
Dima,

As I mentioned, I am open-minded about the efficiencies. One of my first rules of advice is, "One test is worth 10 expert opinions." So, I recommend a test of both scenarios by DJBJR. I am interested most particularly in "DJBJR's" findings relating to actual run times/CPU consumption...whether differences are trivial or significant. DJ, could you please post your findings here? Remember to include the time it takes to drop and re-create related indexes.

Dima, I particulary like you code since it is wonderfully "tight".

[santa]Mufasa
(aka Dave of Sandy, Utah, USA @ 19:05 (12May04) UTC (aka "GMT" and "Zulu"), 12:05 (12May04) Mountain Time)
 
I know you guys have been eagerly awaiting the results.....

The pl/sql block worked faster by 8 minutes!

Dropping the bitmapped indexes of course saved me a bundle.

Thanks and happy coding!

dj
 
DJBJr,

Interesting...when you say, "The pl/sql block worked faster by 8 minutes!", is that 8 minutes faster than the original 38 minutes, or 8 minutes faster than the SQL-only code; and if the latter, how many minutes did the SQL-only take and how many minutes did the PL/SQL block take?

Cheers,

[santa]Mufasa
(aka Dave of Sandy, Utah, USA @ 23:27 (14May04) UTC (aka "GMT" and "Zulu"), 16:27 (14May04) Mountain Time)
 
PL/SQL WITHOUT DROPPING INDEXES - 38 minutes

NO PL/SQL DROPPING INDEXES - 18 minutes

PL/SQL DROPPING INDEXES - 10 minutes
 
Ok, in this case I should publish my thoughts.
First of all the speed of my update statement obviously depends on the efficiency of select. Is your statistics up to date? Can you post execution plan of that outer join query? Of course, Dave's joining technique is very efficient

Then, the environment should be the same, i.e. the execution of the first statement may move data from disk to cache thus all subsequent statements read from memory, not from disk.
One more reason of the speed increase may be in already extended RBS.

And after all, how did you measure the time? Are you sure that pl/sql procedure completed without errors? I mean that when max(master.id)>max(source_table.id) then an error may be raised due to the attempt to fetch from already closed cursor.

I also suspect that null; statement in case nothing should be updated. If most of values are already correct, this additional condition may eliminate the number of rows to be updated. Thus I may suggest to add it to my code:
Code:
update (select s.col source, t.col target
 from source s, target t 
 where s.pk(+)=t.pk and t.col <> nvl(s.col, 0))
set target = nvl(source, 0)

Regards, Dima
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top