Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Huge INSERT batch size? 1

Status
Not open for further replies.

gbaughma

IS-IT--Management
Staff member
Nov 21, 2003
4,773
US
So, I'm doing this huge INSERT (about 1.2M Records), and my log file keeps filling up.

Is there a way to do it in chunks? I've been reading a little about batch sizes.....

Is there a command that I can put in the stored procedure that says "Do 5000 records at a time" or something similar, before my INSERT INTO ... SELECT statement?



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #3
I read that... but I'm not understanding it.... (and I'm having problems getting it to work....)

I'm having problems with wrapping my brain around
Code:
Set rowcount 50000
Declare @rc int
While @rc=50000
Ummmm @rc isn't set yet... so wouldn't the Begin never run?

... then, at the bottom of the script, it sets @rc=@@ROWCOUNT... but wouldn't it be "1" the first time through, and therefore abort the loop?




Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #4
All right... here is my code.... when this runs, I get no records inserted.

Code:
USE [master]
GO
/****** Object:  StoredProcedure [dbo].[sp_BuildWebSample]    Script Date: 04/02/2010 11:15:09 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROC [dbo].[sp_BuildWebSample]
AS

EXEC xp_cmdshell 'COPY /Y \\server\LIMS_BE.MDB D:\'

TRUNCATE TABLE NtsWeb.dbo.WebSample

Set ROWCOUNT 10000
Declare @rc int

While @rc=10000

Begin
	Begin Transaction

INSERT INTO NtsWeb.dbo.WebSample ( ProjectID, [NTS COC#], [Sample#], 
MPCAOpenUnits, Location, [Unique#], CollectionDate, CollectionTime, 
Matrix, SampleType, Analyte, CAS, STORET, [Sign], Result, Units, 
ReportLimit, AnalysisDate, AnalysisTime, Method, Filtered, 
ReportDate, MPGraphicalResult, Qualifier, MethodShort, CASMPCA, 
InstrumentType, ParameterCode, LabCertification, LabName, Surrogate, 
FieldLabFlag, EventKey, MPCAReportAnalyte, MPCAReportLocation, SampleDepth, 
LabType, Series, NoPrep, PrepDate )
SELECT Project.ProjectID, COC.COC, Sample.SampleNum, 
Unit.MPCAOPENUnitName, Location.LocationName, 
Location.UniqueNum, Sample.CollectionDate, 
Sample.CollectionTime, Matrix.MatrixName, SampleType.SampleType, 
Analyte.AnalyteName, Analyte.CAS, Analyte.STORET, 
Results.SignField, Results.Result, Unit.MPCAOPENUnitName, 
Results.ReportLimit, Results.AnalysisDate, Results.AnalysisTime, 
Method.MethodName, SampleType.Filtered, COC.ReportDate, 
Results.Result, 
'', Method.MethodAbbr, 
Analyte.CASnodash, Instrument.BarrInstrumentName, Analyte.BarrCode, 
Replace([CertificationNum],'-',''), Laboratory.LabName, 
Analyte.Surrogate, 'L', COC.MPCAEventKey, Analyte.[MPCA Report], 
Location.MPCAReport, '', '', Series.SeriesLongName, 
SamplePrepType.NoPrep, SamplePrep.PrepDate
FROM LIMSBE...Unit INNER JOIN ((
LIMSBE...Project INNER JOIN 
LIMSBE...Timeslips ON Project.ProjectID = Timeslips.ProjectID) 
INNER JOIN (
LIMSBE...Series INNER JOIN (
LIMSBE...SampleType INNER JOIN ((
LIMSBE...Location INNER JOIN (
LIMSBE...COC INNER JOIN 
LIMSBE...Sample ON COC.COCID = Sample.COCID) 
ON Location.LocationID = Sample.LocationID) INNER JOIN (
LIMSBE...SamplePrepType RIGHT JOIN ((
LIMSBE...SamplePrepLog RIGHT JOIN 
LIMSBE...SamplePrep ON SamplePrepLog.SamplePrepLogID = SamplePrep.SamplePrepLogID) 
RIGHT JOIN (
LIMSBE...Method INNER JOIN (
LIMSBE...Matrix INNER JOIN (
LIMSBE...Laboratory INNER JOIN (
LIMSBE...Analyte INNER JOIN (((
LIMSBE...SeriesSample INNER JOIN 
LIMSBE...Results ON SeriesSample.SeriesSampleID = Results.SeriesSampleID) 
INNER JOIN 
LIMSBE...Analysis ON Results.AnalysisID = Analysis.AnalysisID) INNER JOIN 
LIMSBE...Instrument ON Results.InstrumentID = Instrument.InstrumentID) 
ON Analyte.AnalyteID = Analysis.AnalyteID) 
ON Laboratory.LabID = SeriesSample.LabID) 
ON Matrix.MatrixID = Analysis.MatrixID) 
ON Method.MethodID = Analysis.MethodID) 
ON SamplePrep.SamplePrepID = Results.SamplePrepID) 
ON SamplePrepType.SamplePrepTypeID = SamplePrepLog.SamplePrepTypeID) 
ON Sample.SampleID = SeriesSample.SampleID) 
ON SampleType.SampleTypeID = Sample.SampleTypeID) 
ON Series.SeriesID = SeriesSample.SeriesID) 
ON Timeslips.TimeslipsID = COC.TimeslipsID) 
ON Unit.UnitID = Analysis.UnitID
WHERE (((COC.ReportDate) Is Not Null) AND ((Results.Inactive)=0));

Select @rc = @@ROWCOUNT

COMMIT TRANSACTION

End

... as you can see, I'm doing a HUGE insert from a linked Access database with a spaghetti of linked tables. It works out to about 1.2M records total. I need to split it into "Chunks", and I've been trying to use the @@ROWCOUNT and ROWCOUNT to make this happen.

Thanks in advance... I know that code is crazy...

Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #5
I found this little tidbit....

Code:
SET ROWCOUNT 1000
WHILE 1 = 1
BEGIN
INSERT INTO........  WHERE ......

IF @@ROWCOUNT = 0
BEGIN
BREAK
END
END

SET ROWCOUNT 0

... trying it now....


Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #7
MS SQL Server Express 2008



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
Greg,

I think the thing you are missing here (with the concept) is that you must modify the SQL you use to accommodate the batches. This is why I was asking about the primary key. There are various ways to do this, but that all involve a loop and a method for splitting up the insert into smaller batches.

Suppose you wanted to insert all rows where the unit.unitid > 1000. You could set up a loop like this...

[tt][blue]

Declare @UnitId Int
Declare @RC Int

Set @UnitId = 1001
Set @RC = -1

While @RC <> 0
Begin

Begin Transaction

Insert Into Blah(....)
Select colums
From Tables
Where Unit.UnitId Between @UnitId And @UnitId + 1000

Commit Transaction

Set @RC = @@RowCount
End

[/blue][/tt]

Alternatively... if you need to skip around in the units table because the rows you need are not consecutive, you could create a table variable where you drop the values you need in to it and then use the table variable to split up your batches. The key here is... the table variable would have an identity column and the primary key value that controls your inserts. Something like this...

[tt][blue]
Declare @Temp Table(RowId Int Identity(1,1), UnitId Int)
Declare @RowId Int
Declare @RC Int

Insert Into @Temp(UnitId)
Select UnitId
From (all your tables)
WHERE (((COC.ReportDate) Is Not Null) AND ((Results.Inactive)=0))

Set @RC = -1
Set @RowID = 1

While @RC <> 0
Begin

Begin Transaction

Insert Into Blah
Select (columns)
From @Temp As TempTable
Inner Join (all your existing tables here)
Where TempTable.RowId Between @RowId And @RowId + 1000

Commit Transaction

Set @RC = @@RowCount
Set @RowId = RowId + 1000

End
[/blue][/tt]


Really though, the important thing to realize is that you must modify the sql in such a way that it limits the number of rows you are using.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
  • Thread starter
  • Moderator
  • #9
George:

There is one other bit field that I can use, which is called "FTPDone"....

So I modified my script first to set all the FTPDone bits to 0 (False), then update them as each batch is done.

I think this might work... however now I've run into "File sharing lock count exceeded" (which is a JET issue).... I've turned up the sharing lock count in the registry, but even at 5000 rows at a time, I'm still getting the error.

I think I need to commit the transaction on each iteration...

Let me try that next.

Part of the issue is that it's such a huge convoluted join, with one-to-many links, that I don't know what the index numbers will end up being.

Update:

Well, the "Commit Transaction" didn't work... here's the latest error.

Code:
Msg 7390, Level 16, State 2, Procedure sp_BuildWebSample, Line 83
The requested operation could not be performed because OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "LIMSBE" does not support the required transaction interface.

This is getting more and more frustrating. I even tried just doing it from Access, with a link to the WebSample SQL table in access, and it timed out.



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #10
Hey George....

Is there a way to NEST? Like a "For each @project in (SELECT PROJECTID from PROJECT)"?

If I could step through each project, that wouldn't be as many records.....

Then I could just step through the project numbers, insert the results for each project (which would be smaller batches).... and not have to worry about setting flags in the database itself....



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
There isn't a way to do "For each" in sql. I mean... not directly. What you can do is select all the project id's in to a temp table with an identity, and then handle them that way. Kinda like this...

[tt][blue]

Declare @Temp Table(RowId Int Identity(1,1), ProjectId Int)

Insert Into @Temp(ProjectId)
Select ProjectId From......

Declare @i Int
Declare @Max Int
Declare @ProjectId Int

Select @i = 1, @Max = Max(RowId)
From @Temp

While @i <= @Max
Begin

Select @ProjectId = ProjectId
From @Temp
Where RowId = @RowId

Begin Transaction

Insert into select blah
Where ProjectId = @ProjectId

Commit Transaction

Set @i = @i + 1
End

[/blue][/tt]

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Slight mistake....

Where RowId = @RowId

Should be

Where RowId = @i

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
  • Thread starter
  • Moderator
  • #13
Working on it... so far I get...

Code:
Msg 141, Level 15, State 1, Procedure sp_BuildWebSample, Line 28
A SELECT statement that assigns a value to a variable must not be combined with data-retrieval operations.
Msg 137, Level 15, State 2, Procedure sp_BuildWebSample, Line 36
Must declare the scalar variable "@RowID".



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #14
Ahhh... a little experimentation....

This seemed to fix the offending line...
Code:
Select @i=1, @Max=MAX(RowID)
From @Temp

... still working on it. :)

Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #15
The routine is running now.... we'll wait and see what the results are.....

I'll be SO GLAD when I can move on past this little issue and get back to some real work. <Sigh>



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #16
Slight mistake....

Where RowId = @RowId

Should be

Where RowId = @i

I saw that... and wondered about it....



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
  • Thread starter
  • Moderator
  • #17
I need to learn to read more carefully.... sorry, George.... I'm just under a lot of pressure to get this running, and I'm starting to make stupid mistakes.



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
It happens. No need to apologize.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
  • Thread starter
  • Moderator
  • #19
Well, my query seems to be running... however it has only done 50,000 records in an hour.

At this rate, I estimate 26 hours to completion. :(



Just my 2¢

"What the captain doesn't realize is that we've secretly replaced his Dilithium Crystals with new Folger's Crystals."

--Greg
 
Working in batches is tricky.

Loop too much (for example, inserting one row at a time) is really bad for performance.

Loop too little and your batch size is huge which is really bad for performance.

The problem now is.... do you stop the process so you can recode parts of this to make a large batch size, or do you just wait till tomorrow for it to finish? Tough call.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top