Updating tables with millions of rows...

Guest_imported · Oct 12, 2001

Hi,

I have to update some columns from several tables SQL server 7.

Some tables can have up to 17 millions rows.

I think it is a good idea to write 'update' statements that update only about 100.000 rows and repeat this step about 170 times.

To write a Perl script to automatize that seems to me a good idea (so I can monitor if the update is working well)
I am using an OBCD driver.

Can someone tell me if this is a good solution ?? Or someone has better ideas ???

Please tell me ...

Thank you

moonshadow · Oct 12, 2001

Please clarify where your data is coming from that you use to update the table. If possible, a stored procedure is the fastest, it can always fail with a useful error message if need be. If you need user feedback (e.g. a progress bar), then a front end bit of code as you suggest is better.

chrislarkin · Oct 12, 2001

What specifically are you updating in these rows? Is it something that is transaction related so that you could use a trigger at the field level?

Obviously a large update statement on WinNT is going to have performance limitations, so describe the logic you are trying to accomplish and let's see if any of the bright folks in this forum can send you in a better direction.

Guest_imported · Oct 17, 2001

Hi,

I think I should take a clear example.

Here is my table :
create table my_table
(ref char (7) NOT NULL,
field1 decimal (9,2),
constraint pk_my_table primary key (ref)
)

with 17 000 000 rows (and more fields than 2 - about 10)

There is no index on field1.

I want to execute the following statement :
UPDATE my_table SET field1=fields1/6

My question is : am I sure this statement will work ?

Is it not better to write :
UPDATE my_table SET field1=fields1/6
WHERE DOC BETWEEN '00000000' and '00099999'
UPDATE my_table SET field1=fields1/6
WHERE DOC BETWEEN '00100000' and '00199999'
....

Thank you for your help

Ollivier2

martynh · Oct 18, 2001

I'll be a bit controversial here (I think). In almost any other DBMS the best approach would be to do this update when the system was off line. Unload the tables to be updated, to a file, write an application to do the update - (very, very much faster than a DBMS) then reload the target tables without logging and then backup the database. I'm pretty new to SQL Server (but have 15 years of experience as a DBA) so I'll be interested to see what is suggested by the experts here. However, it seems to me that the abscense of a fast load utility, which operates outside the DBMS is a big hole in SQL Servers toolkit.

But maybe I'm wrong - I often am.

moonshadow · Oct 18, 2001

Ollivier2 - right
martynh - wrong because:
as the update is on a non-indexed field it should be pretty fast. By splitting the job up as Ollivier2 suggests, you're avoiding the performance hit that SQL takes in building an enormous transaction that it has the option to roll back if things go wrong. If there's no need for others to read, or update the table you could also specify WITH TABLOCK or WITH TABLOCKX respectively, which will prevent SQL locking each page, by locking the whole table instead, and speed things up a bit. I also suggest putting BEGIN TRANSACTION and COMMIT TRANSACTION around your update statements. Making your primary key clustered would also improve performance, but depends how many inserts you do day-to-day.
martynh - there are 3 fast load utilities in SQL 7 - BCP (bulk copy program), DTS (with fast load option checked) and the BULK INSERT statement.
Note that although I'm classed as an SQL expert where I work, I'm a novice compared to the likes of tlbroadbent

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Updating tables with millions of rows...

Guest_imported

New member

moonshadow

Programmer

chrislarkin

Programmer

Guest_imported

New member

martynh

Technical User

moonshadow

Programmer

Similar threads

Part and Inventory Search

Sponsor