Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Can this update query be improved?

Status
Not open for further replies.

micang

Technical User
Aug 9, 2006
626
US
hi All,

I am running the following code:
Code:
update dcd SET dcd.gvw = dcd1.gvw
FROM dcd
INNER JOIN dcd1 ON dcd.[regmark] = dcd1.[regmark]

2 tables, each 35 million rows. Both columns [regmark] are indexed, unique.

The query has been running for almost 3 hours, does this seem correct?

Also, what are the implications if I need to stop this query?

Thanks in advance.

micanguk
 
Code:
update dcd 
SET    dcd.gvw = dcd1.gvw
FROM   dcd
       INNER JOIN dcd1 
          ON dcd.[regmark] = dcd1.[regmark]
[!]Where  dcd.gvw <> dcd1.gvw[/!]

Since you are setting the fields to be the same, you could check to see if they are different and only update the ones that are.

This addition to the query does have a potential problem. If the record in one of the tables is null, then the other will not get updated.

If you can live with the NULL issue (or the column can't be null) then the where clause will probably speed this up.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
thanks gmmastros

the tables are identical (i.e. regmark on both have the exact same data). gvw on dcd is a blank field and it has to be populated with gvw from dcd1.

If this is the case then would the where not make a difference?

I tried on a 21000 row test table and it did it updated in under a second. but now on the 35 mil table it's been running forever.
 
Digging back in to my memory when I was dealing with a database will such a big table, it is often the size of tempdb, the logfile expanding or lots of indexes on the table that can slow things down.

Are there any indexes using the column that you are updating on dcd? If so I'd drop it & recreate it once you have done the update recreate it.

Have a look at your logfile & see if it has expanded a lot.

This might be a case where you are better off cursoring but NOT row by row. I don't say this lightly I really don't like cursors.

Aim to split your updates say 1 million at a time & use the primary key & some maths to do your update. Don't use BETWEEN use > and < as that will be quicker.

As for stopping your query. How long is a piece of string, it could take quite a long time.
 
you can run it in batches, it will be faster than 1 big huge update

Code:
declare @rowcount int

set @rowcount =1
set rowcount 35000

while @rowcount > 0
begin

update dcd 
SET    dcd.gvw = dcd1.gvw
FROM   dcd
       INNER JOIN dcd1 
          ON dcd.[regmark] = dcd1.[regmark]
Where  dcd.gvw <> dcd1.gvw
set @rowcount = @@ROWCOUNT

end

set rowcount 0

Denis The SQL Menace
SQL blog:
Personal Blog:
 
faq183-3141

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
thanks arrowhouse.

I am new to SQL 2000 so need to figure out where the logfile is.

It's been running now for over 3 hrs and I don;t know if I should just kill it and look at doing the update as you suggest, 1 million rows at a time etc. What I don't understand though is, won't it take just long because it still has to join (look through) 335 million rows to link on the 1 million rows?

Both tables do have indexes, but none on the gvw columns. I am not familiar or know of cursoring, so that is something I need to look into.

Can someome please explain what <> or >< is?
 
Thanks all.Thanks SQLDenis for clearing the >< etc for me. I understand that now.

SQLdenis, the code that you provide, that "counts" the rows correct? It does not look at an id field etc? Does it not become a problem on how th etable is sorted maybe?
 
No, first it updates the table and sets dcd.gvw = dcd1.gvw

when it goes to the next step in the while loop
Where dcd.gvw <> dcd1.gvw is not true for the statement that just fired so it will get the next 35000
it will continue until @@Rowcount returns 0 (meaning no such rows exists anymore) and it will exit the while loop

BTW tables are not sorted, that is a misconception

Denis The SQL Menace
SQL blog:
Personal Blog:
 
So basically that will keepdoing 35000 until it is finished all 35 million of them? I thought I had to run that query every time. (i.e do 35000, then next 35 000 etc etc)
 
Sorry, I don't understand. Will this code only do the 1st 35000 rows? Or will it continue going until all the rows have been updated?

From what I gather this is T-SQL? All I have been doing in query analyser is putting in SQL and running code.

Apologies, real newbie to SQL 2000

 
Thanks SQLDenis, will give it a go and post back.
 
Hi SQLDenis,

I tried on 2 tables, 65000 rows each.

If I run it with my initial code in the 1st post of this thread, it runs fine very quickly. (under a second)

Then I tried it with the 35000 rowcount etc code, and it runs for minutes, I stopped it after 3 1/2 minutes and it had updated about 60 % of the rows.

I am not sure now if I should run this against 35 million rows?

I am so confused. Thanks to all for your patience.
 
the process mentioned is the one that normally takes less time to do.

I would try the following.
Code:
declare @rowcount int

set @rowcount =1
set rowcount 35000

while @rowcount > 0
begin

begin transaction   -- new 

update dcd
SET    dcd.gvw = dcd1.gvw
FROM   dcd
       INNER JOIN dcd1
          ON dcd.[regmark] = dcd1.[regmark]
Where  dcd.gvw <> dcd1.gvw
set @rowcount = @@ROWCOUNT

commit  -- new
end

set rowcount 0


Also worth a try, if you can have exclusive use of table and indexes is to do a insert into a temporary table
e.g.
insert into temp_table (field list)
select dcd.fields (all of them except gvw), dcd1.gvw
FROM dcd
INNER JOIN dcd1 ON dcd.[regmark] = dcd1.[regmark]

then remove indexes from original table, rename old table to temp_table 2
rename temp_table to old table and recreate the indexes.

May or not be faster, but probably worth a try.

Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
I do like SQLDenis' suggestion of using the rowcount rather than cursoring. And fredericofonseca has made a sensible suggestion of adding in the transactional control, which I think should help. I would prefer to use a higher rowcount, but you will probably need to experiment a bit to find the best value to use.

I also think it might be worth re-writing the update slightly - to remove the <>. This makes use of the fact that you know the column you want updated is currently NULL. Don't do it if the value in dcd1.gvw will be changing between runs of this code.
Code:
declare @rowcount int

set @rowcount =1
set rowcount 35000

while @rowcount > 0
begin

begin transaction   -- new 

update dcd
SET    dcd.gvw = dcd1.gvw
FROM   dcd1
WHERE  dcd.[regmark] = dcd1.[regmark]
AND  dcd.gvw IS NULL --update if the col doesnt have  value
set @rowcount = @@ROWCOUNT

commit  -- new
end

set rowcount 0
To me the most likely culprit is the logfile. Look at the properties of the database & its files.
(Using 2005 now but think you can right click with the db highlighted & get this from enterprise manager in 2000)
It may be difficult for you to see as you don't know what the original size was, but can you see how big the logfile is & how big the datafile is? Is the logfile growing over time. If you are doing the update all in one go it is possible that SQL will be trying to write all that info to the transaction logfile & therefore is growing the file all the time to try and do this. It's also worth taking a look at tempdb.

Also can check if there are any update triggers on the table you are updating?

I have assumed, by the way, that nothing is blocking your process. It may be worth looking at the processes running on the server just to check this out. See books on line if you do not know how to do this.
 
As I said on my previous email one alternative is to use a temporary table.

The following is what I have tried on my machine (SQL 2000, 2GB ram, P4 3.0 with HT with both available to SQL, HD U320 36GB, working at 40MB transfer mode.
Database file size 14GB).

Table
17 fields, char(10) named (COLUMN_x) with the following sample data in all records.
A_36993 A_36993_2 A_36993_3 A_36993_4
A_36993_5 A_36993_6 A_36993_7
x x x x
x x x x
x x

table_1 and table_2 with a index on column_1 (primary key).
Table_3 with no indexes. All with the same columns.

Table_1 is the one with the correct field value, table_2 is the one you are trying to update
Tables with 12 058 440 records each.


Using the method sugested here (update with batch of 35k rows), after 3:30 hours it had only updated 10 million rows.

With the following code
Code:
begin transaction   -- new

insert into table_3
select
F1.COLUMN_1,f2.column_2,F1.COLUMN_3,F1.COLUMN_4,F1.COLUMN_5,
F1.COLUMN_6,F1.COLUMN_7,F1.COLUMN_8,F1.COLUMN_9,F1.COLUMN_10,
F1.COLUMN_11,F1.COLUMN_12,F1.COLUMN_13,F1.COLUMN_14,F1.COLUMN_15,
F1.COLUMN_16,F1.COLUMN_17
from table_2 f1
inner join table_1 f2
on f1.column_1 = f2.column_1

commit  -- new

truncate table [dbo].[TABLE_2]
drop table [dbo].[TABLE_2]
EXEC sp_rename 'TABLE_3', 'TABLE_2'
ALTER TABLE [dbo].[TABLE_2] WITH NOCHECK ADD 
	CONSTRAINT [PK_TABLE_2] PRIMARY KEY  CLUSTERED 
	(
		[COLUMN_1]
	)  ON [PRIMARY] 
GO

The above code took 40 min to insert all the rows into table_3, and then another 40 min to recreate the index.

I did a further test of inserting into the temp table 41 milion rows, and this took 3:20 minutes to do.

I do not know how this table of mine compares to yours, but if yours have only one index also, and if the record size is smaller than mine (mine is 170 bytes long) then timing (on my machine would be shorter.

This code was run on the server, with local database. If you are doing this over a network then timings will suffer.

If you have a machine comparable to mine then you 35 million records should take around 4-5 hours in total if you do it this way.

this is I think the best you can do.

Regards

Frederico Fonseca
SysSoft Integrated Ltd
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top