Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fast generation of sql data 1

Status
Not open for further replies.

divinyl

IS-IT--Management
Nov 2, 2001
163
GB
Guys

What's the fastest way to generate hundreds of gigabytes worth of data??? I'm using DBGen (tool on the sql server 2k resource kit) but it is soooooo slow!!! It's been running for 4 days and i've only managed to generate 5 gb!!! If anyone knows, please reply!

Thanks,
Div
 
Take the table you want to fill, and do a self join against it inserting back into it self. This will create large amounts of data after a few loops.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005) / MCITP Database Administrator (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
hi Denny

Thanks for this.. i'm not a t-sql expert, but came up with this statement:

insert into table1 select * from table1

Have no idea if this is good practise or not!! I started running it and my database did indeed start increasing quite quickly but so did the transaction log, at an even faster rate. I found out i had it set to full recovery mode - i've now set it to simple. However, i've just run the query again and the transaction log is still filling up just as fast. Plus my database is not growing now?? There's no locks on the db so not sure why....

Any ideas??

Thanks,
Div
 
The log will grow to the size of each transaction. If you have 100,000 rows in the table, then the log needs to be able to hold these rows. It will then flush it self out in order to prep for the next command. But the next command is 200,000 rows, so the log needs to increase in size.

By default the database grows 10% when it runs out of space. As the database grows in size the chunks get larger and the database may not need to grow each time.

Instead of looking at the size of the database, in Enterprise Manager right click on the table, and click properties and see how many records there are in the table.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005) / MCITP Database Administrator (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
I still don't understand this..... when i generate a few million rows in the database using DBGen, the transaction log hardly grows, but when i do it using the statement above, the tlog expands extremely quickly. Both are doing insert statements!

I don't understand it....and the statement above is not generating enough data in my database at a quick enough rate. DbGen is proving more successful somehow... perhaps my query is not right??

Div
 
Just to add..the database does grow quicly with this statement, but the log grows so fast that the disk will eventually run out of disk space, so it's not something i want to leave running over night you know? I just wonder why DBGen does not cause the log to grow in this way? Is there some way of running an insert statement without logging to file....? Perhaps it does it through BCP, in which case i think you can turn off logging can't you??

I will have a look through SQL Profiler and see if i can spot what it's doing...

Div
 
dbgen is probably doing one insert at a time so the log won't ever grow. You are doing large batch inserts in a single transaction so SQL has to be able to keep the log in case it needs to roll the transaction back.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005) / MCITP Database Administrator (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
ahhh, that makes sense.. thanks for your help...

Div
 
Denny, hope you and others will still look at this post.

I want to be able to run the self-join insert overnight without worrying about the transaction log expanding. Isn't there a way i can run the insert that will ensure the log gets checkpointed - i've tried running it in a stored procedure and also with the begin tran, commit tran - but no luck, the log still fills massively.

ANY suggestions on how to run this overnight without clogging up the server??

Thanks
Div
 
The only way would be to limit the number of records that get insert each time and then commit after each insert.
Code:
SET ROWCOUNT 10000
while 1=1
begin
   insert into t1
   select * from t1
end

Something like that. It will load up all the records it can, up to 10,000 then start the loop over again. As each insert is only 10000 records the log shouldn't grow more than a few transactions long before it's checkpointed and flushed.

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005) / MCITP Database Administrator (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
thanks denny - good idea, will try that...

Div
 
This is wicked - it works a treat!!!! Thanks so much for your time...

Div
 
no problem

Denny
MCSA (2003) / MCDBA (SQL 2000) / MCTS (SQL 2005) / MCITP Database Administrator (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top