I was helping out a user on the forums the other day where they were asking questions about shredding an xml structure of which they didn't know the schema for. They wanted to know the best way about writing a generic function to shred any xml to a flat table. In the past I have used the XMLTable function developed and blogged by Jacob Sebastian to shred an xml structure and suggested this as a starting place for them. After posting that comment, it did get me thinking though that I could use the XMLTable function by Jacob to shred any simple xml structure to a table and then pivot the results to a single row table and as such came up with something fairly quickly and added that to the thread.
The problem that I have though is that I'm struggling to decide if this is generally ok or not. Xml itself can be highly structured and it is the structure of the xml that gives data within the xml context namely, the parent-child relationship. Flattening the xml data to a single wide table in such a generic way can lose that context. The method that I have come up with does have some limitations but it may have a use in some scenarios when dealing with xml for which you do not know the schema for and need a single flat table quickly. Although personally I would still strive to find a better approach if possible.
Here is the latest method that I came up with which uses the XMLTable function which you'll need first before running this. It calls the function to get a list of xpaths of the nodes with data and then uses that with some dynamic SQL to pivot the data from those nodes to a flat table, using the xpath as the column names.
--Declare variables and populate with example data DECLARE @Cols VARCHAR(MAX) DECLARE @Xml AS XML = CAST(' <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer''s Guide</title> <genre>Computer</genre> <price>44.95</price> <publish_date>2000-10-01</publish_date> <description>An in-depth look at creating applications with XML.</description> </book>' AS XML) DECLARE @SQL NVARCHAR(MAX) --Get a list of the columns from the shreded xml document using XMLTable SELECT @Cols = ISNULL(@Cols, '') + QUOTENAME(XPath) + ',' FROM dbo.XMLTable(@Xml) WHERE Value IS NOT NULL --trim the trailing comma SET @Cols = LEFT(@Cols, LEN(@Cols) -1) --Generate dynamic SQL statement to pivot the rows from XMLTable function in a single row table SELECT @SQL = 'SELECT ' + @Cols + ' FROM ( SELECT XPath, Value FROM dbo.XMLTable(@Xml) WHERE Value IS NOT NULL ) x PIVOT (MAX(Value) FOR xpath IN (' + @Cols +')) y' --Execute the statement EXEC sys.sp_executesql @SQL, N'@Xml XML', @Xml = @Xml
The limitations that I see are:
- Limit of 128 character column names. Deep xml /long node names will easily blow this limit. The XMLTable function could probably be reworked to change the xpath to something shorter based on some kind of algorithm.
- Limit of 4096 columns, so this limits the xml to 4096 nodes containing data. Not sure if this limit is the same for pivots or if it is lower. Although having 4096 columns in your resultset is going to make things quite unwieldy anyway.
Sources: Jacob Sebastian's XMLTable function