Cases where the input variable is of text datatype and perhaps extremely long

Cases where the input tokens are of known length or of a relatively low maximum length

The use of a numbers table instead of a loop

So here you go.

First, create a Numbers table

If you don't have a Numbers table, here's some code for you to create one. Note that there are debates about the use of this technique. In many situations, using a Numbers table can be a sign of sloppinessÃ¹but not always. See option 6 for some notes about improved performance. Note that the clustered primary key in this table is absolutely essential to its proper performance. I pick 8000 numbers for arbitrary reasons, to have a good mix of small size (speed) and range of numbers (too small and I can't do the work I need to do).

CODE

CREATE TABLE Numbers (Num int identity(1, 1) NOT NULL CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED) SET NOCOUNT ON INSERT Numbers DEFAULT VALUES WHILE Scope_Identity() < 8000 INSERT Numbers DEFAULT VALUES

Option 5 - use a Numbers table

Now that you have a Numbers table, here's a function that uses it to split a string into columns.

CODE

CREATE FUNCTION SplitNumbers ( @InputText varchar(8000), @Delimiter varchar(1) ) RETURNS TABLE AS RETURN ( SELECT --This first column will drastically hurt performance TokenID = Num -Datalength(Replace(Left(@InputText, Num - 1), @Delimiter, '')), Value =Substring(@InputText, Num, CharIndex(@Delimiter, @InputText + @Delimiter, Num) - Num) FROM Numbers WHERE Substring(@Delimiter + @InputText, Num, 1) = @Delimiter AND Num <=Datalength(@InputText) + 1 )

If you compare this to option 4 from the previous FAQ, you'll find that it's much faster for longer strings (some testing is in order for short strings).

But after a lot of research on this topic more than two years ago, I found an article that tested performance of all the various ways of splitting a string into rows. (Yes, it took me long enough to come write this FAQ. I'll try to find that article and if it's still out there will provide a link.) The clear winner was to use a Numbers table against fixed-length inputs. The idea is instead of submitting a string '1,2,7,42' with delimiters between the values, the client application submits them with padding so they are all the same width: for example '1 2 7 42 ' could be an input string with a token length of 3 characters for each number.

One problem with the numbers table though is, what if you want to use a text variable with extremely long lists of numbers? Let's say you're using positive integers, which in SQL Server can be ten characters long. In an 8000-character string, that's only 800 values. It's not a stretch to imagine a client needing to work with 800 rows at once, for example an orders table could easily have this many for just one day.

So combining the fixed-length technique with a cross-join of the numbers table against itself (to handle up to 64,000,000 characters in the case of an 8000 number table), here is

Option 6 - Numbers table with fixed-length tokens

So here is the culmination of my search for the absolutely fastest split function in SQL Server that can handle very long strings. Make sure you have a Numbers table (see the code above if you don't). It can have any number of numbers in it, but note that if you have n numbers then the maximum length of a string you can handle is n^{2}.

CODE

CREATE FUNCTION SplitFixed ( @InputText text, @TokenLength tinyint ) RETURNS TABLE AS RETURN ( SELECT TokenID = N1.Num + MaxNum * (N2.Num - 1), Value = SubString( @InputText, @TokenLength * (N1.Num + MaxNum * (N2.Num - 1)- 1) + 1, @TokenLength ) FROM Numbers N1 CROSS JOIN ( SELECT MaxNum = Max(Num)FROM Numbers ) M JOIN Numbers N2 ON @TokenLength * (M.MaxNum * (N2.Num - 1) + N1.Num - 1) + 1 <=DataLength(@InputText) WHERE N2.Num <=DataLength(@InputText) / (MaxNum * @TokenLength) + 1 AND N1.Num <= CASE WHENDataLength(@InputText) / @TokenLength <= MaxNum THENDataLength(@InputText) / @TokenLength +CASE DataLength(@InputText) % @TokenLength WHEN 0 THEN 0 ELSE 1 END ELSE MaxNum END )