I know that this subject has been blogged about numerous times but I thought I’d post an alternative way of shredding a comma separated string to a table that I come up with today, albeit with help from the very clever Itzik Ban-Gan cross join nested CTE that many of us use to generate large tally tables quickly.
For sure, in the ideal world we wouldn’t have to do this as there isn’t an easy and efficient way to do this in T-SQL. There are much better ways of passing this type of data to the SQL server, i.e. XML or table valued parameters, but sometimes we aren’t in control of what and how data is being sent to and from the SQL server so we have to engineer these not so ideal solutions. Also the point of this post was more about me having a little fun and share those results with you and not about proving either way which technique is better. Different environments and workloads will always require you to evaluate all methods to see which is best. There are half a dozen and more techniques to split a comma separated string to a table and this afternoon I hope that I have added another way to that list!
So first of all, let us deal with the easy part of generating a comma separated string from a column in a table. For me there is only one real technique of doing this so I’m not going to spend too much time writing about it but I wanted it here for completeness.
--Create a table and populate it with some data CREATE TABLE MyTable (MyCol VARCHAR(50)) INSERT INTO MyTable (MyCol) VALUES ('Yellow') INSERT INTO MyTable (MyCol) VALUES ('Blue') INSERT INTO MyTable (MyCol) VALUES ('Green') INSERT INTO MyTable (MyCol) VALUES ('Red') --Retrieve all records from the column in the table as a comma separated string into variable DECLARE @CommaString VARCHAR(300) SET @CommaString = '' SELECT @CommaString = @CommaString + MyCol + ',' FROM MyTable --trim trailing comma of string and return variable SELECT LEFT(@CommaString, LEN(@CommaString) -1)
And the results
Nice and simple!
Unfortunately converting a comma separated string into a table isn’t as easy. Well, I mean it is fairly easy to do it, but there isn’t a nice and clean way of doing it. .net developers are blessed with the string method .Split() and yes creating a CLR to chop up these strings is definitely one good option to achieve this if that is what you decide to do. Other options to chop up a string to a table include:
- Use a loop to read/parse the string from left to right whilst extracting the values to a temp table
- Cast the string to xml by strategically replacing the comma’s to xml nodes and then using the nodes method to shred the xml to a table
- Use a CTE to break the string down into a temp table
- Use a tally table as shown here in an SSC article
Although I’ve always thought that doing the xml technique is probably consistently the quickest, I was quite intrigued by the tally table approach and I wondered if I could use the same method but eliminate the tally table thus saving those valuable reads. After pondering on the problem for a short while I recalled the Itzik CTE tally table and proceeded to try and use that instead of an actual tally table.
So here is my first attempt at doing this:
--declare a variable and populate it with a comma separated string DECLARE @SQLString VARCHAR(MAX) SET @SQLString = '111,211,311,411,5,6,7,811,911,1199,2299,3399,4499,5599,699,799,899,999'; --append a comma to the string to get correct results with empty strings or strings with a single value (no commas) SET @SQLString = @SQLString + ','; --the main query --Itzik Ban-Gan style CTE to generate us a lot of numbers (this will produce up to 65536) WITH Nbrs_3(n) AS (SELECT 1 UNION SELECT 0), Nbrs_2(n) AS (SELECT 1 FROM Nbrs_3 n1 CROSS JOIN Nbrs_3 n2), Nbrs_1(n) AS (SELECT 1 FROM Nbrs_2 n1 CROSS JOIN Nbrs_2 n2), Nbrs_0(n) AS (SELECT 1 FROM Nbrs_1 n1 CROSS JOIN Nbrs_1 n2), Nbrs (n) AS (SELECT 1 FROM Nbrs_0 n1 CROSS JOIN Nbrs_0 n2) --chop up the string based on the position of the comma returned from the inner query SELECT SUBSTRING(@SQLString, n+1, CHARINDEX(',', @SQLString, n+1) - n-1) FROM ( --select a holding record to ensure we get the very first value in the string SELECT 0 AS 'n' UNION ALL --select the maximum amount of generated numbers that we will need. --this will be the len of the string -1. the last char of string is a comma SELECT TOP(LEN(@SQLString)-1) ROW_NUMBER() OVER (ORDER BY n) AS 'n' FROM Nbrs ) x --only return the numbers that equate to the position of a comma in the original string --returning 0 ensures that we get the first value in the string. WHERE SUBSTRING(@SQLString, n, 1) = ',' OR n = 0
..and the results
Perfect, all looks good to me. And the execution stats look good as well. A short amount of time to compile the first time around, then nothing each time afterwards.
Lets crank it up to a fairly unrealistic level to see what happens. I’m going to run the same script again but with a comma separated string containing over 2000 values and to compare performance, we’ll use the tally table example. I’m not going to post the new string here as it’ll just wrap loads on this post but the script is available for download at the bottom. So here are the results from profiler for the same query but processing some 2000 values.
Still very good times and I’m more than happy with those sort of figures coming from my rather basic desktop.
Now I grabbed the code from the other SSC article and after substituting the comma separated string for my one, I ran the query 4 times to compare the performance.
Definitely slower. So by using this new Itzik style tally table approach, we have improved CPU/Duration but more importantly used 0 reads compared to 2267 reads.
Great stuff. But, will it beat the xml technique using the same comma separated string? (script at the bottom)
No it does not, or at least not in this example with my first version above. The xml technique is similar with CPU/reads but duration wise, it is consistently quicker by a fair margin and I believe the margin to be too significant to try and rework my first version of the query to a second version. Maybe I can get it closer, maybe not. That isn’t really the point as although my experiment has failed in terms of performance compared to the xml technique, I hope that this post shows that with a bit of thought, the very clever Itzik CTE can replace the traditional tally table and give some significant performance improvements. And with regards to converting a comma separated string to a flat table, I’d more than likely use the xml approach!