Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Passing a list of values to a Stored Procedure (Part III) 3

Status
Not open for further replies.

ESquared

Programmer
Joined
Dec 23, 2003
Messages
6,129
Location
US
I'd appreciate your comments on a new FAQ I just wrote, Passing a list of values to a Stored Procedure (Part III).

I think it could use some additional language describing how to use it and why you would use it instead of the function in part 2. And in fact after posting last night I thought of a possibly more efficient way to write the query for "method 6."

But your feedback would be appreciated.

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
Well, first of all you are missing the CLR function, but we can agree to disagree on that one ;-)

I also disagree with your comments on a numbers table being a sign of sloppiness. Using a numbers table is the fastest and easiest way to do lots of things, and I do not think it is sloppy at all. Perhaps an example could enlighten me on what you mean by this?

I have taken to using the technique George demonstrates in this thread when there is the possibility that I will need to pass a very long string of inputs. It is slightly slower for small strings, but much faster for the larger ones (I tested this a long time ago for ranges up to 10,000 US zip codes, I don't remember the exact numbers). You might find this helpful in the future, as it seems that you use split functions very often! thread220-1249932

Just something to think about.

Thanks for sharing :-)

Alex

Ignorance of certain subjects is a great part of wisdom
 
ESquared,

I am about to share some code with you. This code originally came from SQLDenis, so he deserves the credit for it. I should mention that when I ran speed comparisons, this method was an order of magnitude faster than yours.

There are a couple limitations that I should mention. First, you need to have a numbers table. But more importantly, there needs to a record in the numbers table that matches the length of the string that you want to split. For example, if you have a 1 meg string, then you need a numbers table with a million records.

I tested this very quickly on a string that is 5,882 characters long and represents 1000 numbers seperated by commas. I did NOT compare this with method #6.

Code:
[COLOR=blue]ALTER[/color]  [COLOR=#FF00FF]Function[/color] [dbo].[FieldNameSplitter] (@MyText [COLOR=blue]Text[/color] ) 
Returns @Tbl_FieldNames [COLOR=blue]Table[/color] (TokenId [COLOR=blue]int[/color] [COLOR=blue]identity[/color](1,1), FieldName [COLOR=blue]VarChar[/color](100)) [COLOR=blue]As[/color] 
[COLOR=blue]Begin[/color] 
	[COLOR=blue]Declare[/color] @Len [COLOR=blue]Int[/color]
	[COLOR=blue]Select[/color] @Len = [COLOR=#FF00FF]DataLength[/color](@MyText)
	
	[COLOR=blue]Declare[/color] @Temp [COLOR=blue]Table[/color]([COLOR=#FF00FF]RowId[/color] [COLOR=blue]Integer[/color] [COLOR=blue]Identity[/color](1,1), Data [COLOR=blue]Varchar[/color](4000))
	
	[COLOR=blue]Declare[/color] @i [COLOR=blue]Int[/color]
	[COLOR=blue]Declare[/color] @Max [COLOR=blue]Int[/color]
	[COLOR=blue]Declare[/color] @Data [COLOR=blue]VarChar[/color](4000)
	[COLOR=blue]Declare[/color] @Position [COLOR=blue]Int[/color]
	
	[COLOR=blue]Set[/color] @Max = [COLOR=#FF00FF]Ceiling[/color](1.0 * @Len / 4000)
	[COLOR=blue]Set[/color] @i = 1
	
	[COLOR=blue]Set[/color] @Position = 0
	[COLOR=blue]Declare[/color] @TempPosition [COLOR=blue]Int[/color]
	
	[COLOR=blue]While[/color] @Position < @Len
		[COLOR=blue]Begin[/color]
	
			[COLOR=blue]Set[/color] @Data = [COLOR=#FF00FF]SubString[/color](@MyText, @Position, 4000)
	
			[COLOR=blue]Set[/color] @TempPosition = [COLOR=#FF00FF]CharIndex[/color]([COLOR=red]','[/color], [COLOR=#FF00FF]Reverse[/color](@Data))
			[COLOR=blue]Set[/color] @TempPosition = 4000 - @TempPosition
	
			[COLOR=blue]Insert[/color] [COLOR=blue]Into[/color] @Temp(Data)
			[COLOR=blue]Select[/color] [COLOR=#FF00FF]SubString[/color](@myText, @Position, @TempPosition)
		
			[COLOR=blue]Set[/color] @Position = @Position + @TempPosition + 1
	
		[COLOR=blue]End[/color]
	
	[COLOR=blue]Select[/color] 	@i = 1,
			@Max = [COLOR=#FF00FF]Max[/color]([COLOR=#FF00FF]RowId[/color])
	[COLOR=blue]From[/color]	@Temp
	
	[COLOR=blue]Declare[/color] @Output [COLOR=blue]Table[/color] ([COLOR=#FF00FF]RowId[/color] [COLOR=blue]integer[/color] [COLOR=blue]Identity[/color](1,1), Data [COLOR=blue]VarChar[/color](8000))
	
	[COLOR=blue]While[/color] @i <= @Max
		[COLOR=blue]Begin[/color]
			[COLOR=blue]Select[/color] 	@Data = Data
			[COLOR=blue]From[/color]	@Temp
			[COLOR=blue]Where[/color]	[COLOR=#FF00FF]RowId[/color] = @i
	
			[COLOR=blue]Insert[/color] 	[COLOR=blue]Into[/color] @Tbl_FieldNames(FieldName)
			[COLOR=blue]SELECT[/color]	[COLOR=#FF00FF]SUBSTRING[/color]([COLOR=red]','[/color] + @Data + [COLOR=red]','[/color], Num + 1, 
					[COLOR=#FF00FF]CHARINDEX[/color]([COLOR=red]','[/color], [COLOR=red]','[/color] + @Data + [COLOR=red]','[/color], Num + 1) - Num -1)[COLOR=blue]AS[/color] [COLOR=blue]Value[/color] 
			[COLOR=blue]FROM[/color]	Numbers NumberPivot
			[COLOR=blue]WHERE[/color]	Num <= LEN([COLOR=red]','[/color] + @Data + [COLOR=red]','[/color]) - 1 
			        AND [COLOR=#FF00FF]SUBSTRING[/color]([COLOR=red]','[/color] + @Data + [COLOR=red]','[/color], Num, 1) = [COLOR=red]','[/color] 
	
			[COLOR=blue]Set[/color] @i = @i + 1
		[COLOR=blue]End[/color]
	
	[COLOR=blue]Return[/color]
[COLOR=blue]End[/color]

I tested like this...

[tt][blue]
Declare @S DateTime
Set @S = GetDate()
Select * From dbo.FieldNameSplitter('234,62354,234,2,lots more numbers')
Select 'George Method', DateDiff(Millisecond, @S, GetDate())

Set @S = GetDate()

Select * From dbo.SplitNumbers('234,62354,234,2,lots more numbers', ',')

Select 'ESquared Method', DateDiff(Millisecond, @S, GetDate())

[/blue][/tt]

In my test, my function returned the same data as your, but executed in 16 milliseconds, whereas yours executed in 186 milliseconds.

Again. I cannot take credit for this. It is SQLDenis's work that made this possible.

-George

"the screen with the little boxes in the window." - Moron
 
Alex,

I don't believe the use of a Numbers table is sloppy. I did when I first heard of it, but I was a lot less experienced then. I'll alter my wording in the FAQ.

George,

Thanks for posting the function. Keep in mind that I wrote that FAQ three years ago! I've been using the fixed-length method for at least two years now. I'll take a look at his method and probably stick it in for completeness. I'll also take a look at your XML method and add that too. Why not be complete? If someone needs to know how to get a lot of data into SQL Server through a parameter, I may as well collect all the methods I know in one place.

When you refer to "my" function, do you mean the Option 4 from the FAQ part 2? It is markedly slower than the Numbers tables in part III.

And I have a question about something you said in the thread Alex linked to above.

George said:
The problem with table valued functions is that you cannot apply them to multiple records with a single statement. Usually, this requires the use of a cursor or a loop.
What do you mean? You can join to a table valued function just like any other table. Or did you mean that you can't use an input value of a column from a joined table?

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
>> Keep in mind that I wrote that FAQ three years ago!
OMG! You should see some of the code I wrote 3 years ago. I'm not suggesting anything negative about your function. My point is that a good developer will constantly improve their skills.

>> When you refer to "my" function, do you mean the Option 4 from the FAQ part 2? It is markedly slower than the Numbers tables in part III.

I used Option 5, with the numbers table.

I'll work up an example, to clarify what I meant by that statement (that you quoted from the other thread), but it'll take a couple minutes. [smile]


-George

"the screen with the little boxes in the window." - Moron
 
Hi ESquared.

I'm not sure if I missed it in your articles, but, assuming you are using a function to return a result table. It seems to be a performance gain to put those results in the temp table or table variable. For some reason, even if you define the returning table structure, SQL does not seem to know much about what it is receiveing back from the function. This in turn causes a lot more logical reads. However, if you load the table variable and then join to that, rather than the function itself, you should notice a signifigant reduction in the logical reads.

Also, here is a link to an article that has some different algorithims for splitting (which you may have already seen).
I did some messing around and found that if you have 2005 the CTE version runs faster than some of the other iterative approches (and is more readable to me :) ). But, most of the performance gain comes from the joining aspect of the function.
 
You ARE the star. Why do you need more? [smile]

[small]I gave you one anyway.[/small]

-George

"the screen with the little boxes in the window." - Moron
 
ESquared,

Here is an example.

Suppose you have this table.

[tt][blue]
Declare @Temp Table(Id Int, CommaStuff varchar(50))

Insert into @Temp Values(1, 'red,blue,green')
Insert Into @Temp Values(2, 'pink,orange,purple')
[/blue][/tt]

Then, suppose you want to return the data like this....

[tt][blue]
Id Color
-- ---------
1 red
1 blue
1 green
2 pink
2 orange
2 purple
[/blue][/tt]

As far as I know, it can't be done with a generic split function. I'd love to be proved wrong. [smile]


-George

"the screen with the little boxes in the window." - Moron
 
Lamprey13,

Thanks for the input. It's possible that the article I read two years ago about testing various methods was in fact written by Erland, the author of that article. Either that, or he and I took code from the same source, because my option 6 looks exactly like his fixed-length solution (which he claims again in the article that it is indeed the fastest method).

George,

I ran Denis's function you posted with the following:

Code:
select * from dbo.FieldNameSplitter('234,62354,234,2,927,25738,1828,17285,3,578,37285')
and my resultset was 2 rows. There's a bug in there somewhere. This could explain the superior speed...


[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
ESquared,

I just now ran the same query and got 11 rows. How many records are in your numbers table?

-George

"the screen with the little boxes in the window." - Moron
 
My mistake. The particular database I tried it in only had a numbers table with 10 values in it. Splork. Thanks for troubleshooting it for me [blush]. Trying again...

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
I just tested this against Method #6, and it appears to be the winner. [smile]

To test this... I created a string with 1000 random numbers from my numbers table.

Code:
Declare @Temp VarChar(8000)
Set @Temp = ''
Select @Temp = @Temp + Convert(Char(8), Num)
From	(
		Select 	Top 1000 Num 
		From 	Numbers 
		Order By NewId()
		) As A

Select @Temp

I then did some clever replaces to get a comma delimited string with the same values in the same order.

The results are....

[tt]
Denis's method: 13 milliseconds
Method #5: 190 milliseconds
Method #6: 0 milliseconds
[/tt]

I don't trust the timing when it is so low, but clearly Denis's method and Method #6 are both significantly better than Method #5.



-George

"the screen with the little boxes in the window." - Moron
 
As far as I know, it can't be done with a generic split function.
A little research shows that in 2000, you're right. I was surprised to find that in SQL 2005, that's not true. Taking from Erland's page you linked to above:

In SQL 2000, it was not possible to call a table-valued function and pass a table column as parameter, but SQL 2005 adds the APPLY operator that permits you to do this. Here is a script that creates the table above and then unpacks it with a query:
Code:
CREATE TABLE models (modelid  char(4)      NOT NULL,
                     -- other columns like modelname etc.
                     colours  varchar(200) NOT NULL,
                     CONSTRAINT pk_models PRIMARY KEY (modelid))
go
INSERT models (modelid, colours)
  SELECT 'A200', 'Blue, Green, Magenta, Red'
  UNION
  SELECT 'A220', 'Blue, Green, Magenta, Red, White'
  UNION
  SELECT 'A230', 'Blue, Green, Magenta, Red, Cyan, White, Black'
  UNION
  SELECT 'B130', 'Brown, Orange, Red'
  UNION
  SELECT 'B150', 'Yellow, Brown, Orange, Red'
go
SELECT m.modelid, t.str AS colour
FROM   models m
CROSS  APPLY  iter_charlist_to_tbl(m.colours, ',') AS t
ORDER  BY m.modelid, t.str

And regarding option 5 in my FAQ, there was a comment in there saying "This first column will drastically hurt performance." Take that column out and rerun your test. :-) I have never needed that first column, so it's more efficient for me to just leave the TokenID column out.

I'll offer a version of the function in the FAQ to insert to a table variable with an identity column, but it can't be inline, it will have to be multi-statement. I also took out the delimiter parameter to compare more exactly.

Code:
CREATE FUNCTION SplitNumbers2 (
   @InputText varchar(8000)
)
RETURNS @Values TABLE (TokenID int identity(1, 1) primary key clustered, Value varchar(100)) -- change data type of output as you please
AS
BEGIN
	INSERT @Values
   SELECT
      Value = Substring(@InputText, Num, CharIndex(',', @InputText + ',', Num) - Num)
   FROM Numbers
   WHERE 
      Substring(',' + @InputText, Num, 1) = ','
      AND Num <= Datalength(@InputText) + 1
	RETURN
END
And comparing the three methods, the one you posted, the Method 5 from my FAQ (without the TokenID column) and this new function (three separate batches).

Full testing would use varying lengths of @t and would not just select the results but use them in a join to a table, so take the following with a grain of salt. I ran these 50 times together, each in its own batch, then averaged their values from SQL Profiler. That's also going to be inaccurate because you can't get values between 1ms and 12ms (or was it 16ms?)

[tt] CPU Reads Writes Duration
select * from dbo.FieldNameSplitter(@t) 30.1 2116.7 0 48.4
select * from dbo.SplitNumbers(@t, ',') 6.6 40 0 23.2
select * from dbo.SplitNumbers2(@t) 31 3837.1 0 49.7
[/tt]

Keep in mind the comment that performance on the middle function could be affected because the optimizer doesn't know what to expect from it, whereas with the other two it knows the specs of the output table ahead of time.

Clearly more testing is in order. But I wouldn't care to go to the lengths Erland already has--he says fixed-length is superior to everything else, including XML.

All,

Here is Erland's "famous article" that I originally read two or more years ago. George already posted above the version for SQL Server 2005.

It is well worth reading both articles as he goes into great depth. A LOT is covered, including recompilation, execution plans, collation, sorting, indexes, and you'll undoubtedly learn at least one important thing. If you want to be knowledgeable then you can't miss this stuff.


[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
Just in case you missed it, method #5 was supposed to have the "really badly performing column" removed. I can see I need to alter the FAQ to make people put it in if they want it rather than take it out if they don't.

[COLOR=#aa88aa black]Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt.[/color]
 
gmmastros,

Yes you can get output like:
Id Color
-- ---------
1 red
1 blue
1 green
2 pink
2 orange
2 purple

If you use 2005, look up the APPLY operator. In my sample, fn_StringSlpit1 returns a table with Val as the column of split values. Here is a sample:
Code:
SELECT
	T.ID,
	Foo.Val AS Color
FROM
	@Temp AS T
CROSS APPLY	
	dbo.fn_SplitString1(T.CommaStuff) AS Foo

-Ryan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top