Retrieving position() value of node by using XQuery alone

By Art on December 28, 2011

You can only use the position() function within an xquery predicate and can’t use it to return the position value of the node itself, which is a shame, but a couple of good workarounds have been published. However I had a requirement (don’t ask why!) to return the position() of the node as a value within the xml itself.

My requirement was to return an existing xml structure but to also include the position of a particular node as an additional attribute of that node. My other goal was to keep this all self contained within a single xquery and not shred the xml to get the position of the node. In the absence of using the position() function to return a value as part of the xquery, I was able to come up with something, albeit a bit hacky, that works and seems to scale ok as the number of nodes increases.

First of all some really basic xml:

<Order Id="1234">
  <Items>
    <Item PartId="A1111" Qty="1">
        <Description>Motherboard</Description>
    </Item>
    <Item PartId="B2222" Qty="2">
        <Description>Graphics Card</Description>
    </Item>
    <Item PartId="C3333" Qty="4">
        <Description>2gb Ram</Description>
    </Item>
    <Item PartId="D4444" Qty="1">
        <Description>700w PSU</Description>
    </Item>
    <Item PartId="E5555" Qty="2">
        <Description>300gb SSD</Description>
    </Item>
  </Items>
</Order>

After playing around a bit I found another xquery keyword that has not yet been implemented which would have been useful to have and that is "to". This keyword would have allowed for the code to be that touch more friendly and readable by allowing me to construct a sequence by specifying a range expression. Instead I’ve had to construct a nasty comma separated string sequence in the xquery.

The code below will only work on SQL 2008 and above because of the use of the "let" as part of the statement. You can download the script below from here which will work on 2008 and above. Or you can download a SQL 2005 version of the script here which is the same but has the let syntax removed.

--Declare/assign our example xml blob
    DECLARE @XML XML
    SET @XML = '
    <Order Id="1234">
        <Items>
            <Item PartId="A1111" Qty="1">
                <Description>Motherboard</Description>
            </Item>
            <Item PartId="B2222" Qty="2">
                <Description>Graphics Card</Description>
            </Item>
            <Item PartId="C3333" Qty="4">
                <Description>2gb Ram</Description>
            </Item>
            <Item PartId="D4444" Qty="1">
                <Description>700w PSU</Description>
            </Item>
            <Item PartId="E5555" Qty="2">
                <Description>300gb SSD</Description>
            </Item>
        </Items>
    </Order>'

    --Main xquery to enumerate the <Item> nodes and to return them with a new Position attribute
    SELECT @XML.query('
    for $Items in /Order/Items

    (: this is the hacky bit in the absence of the "to" xquery syntax command :)
    let $Seq := (1,2,3,4,5,6,7,8,9,10)
    return

    (: safety check to return <Error/> if array is smaller than node count. Ensures nodes are not lost :)
    if 
        (count($Seq) < count($Items/*))

    then
        <Error>Array too small.</Error>

    else
        <Order>
            { /Order/@* }
            <Items>
                {
                    for $SeqItem in $Seq[. le count($Items/*)]
                    return
                        (: add new Position attribute to <Item> node and append all other attributes and nodes :)
                        <Item Postition="{$SeqItem}">
                            { $Items/*[$SeqItem]/@* }
                            { $Items/*[$SeqItem]/node() }
                        </Item>
                }
            </Items>
        </Order>
    ')

The way that the xquery works is that first a sequence is declared that contains the numbers 1 to 10 which I intend to use like an array. This is why "to" would have been useful as I could have just written let $Seq := (1 to 10) rather than having to use a long comma separated string which looks nasty. There is then a basic check to ensure that this sequence is actually large enough to cover all of the nodes. The if..then..else statement could be removed to gain a tiny performance boost with a little extra risk however. Then a loop gets all nodes in the original xml by using the sequence values as ordinal positions where the value of the sequence is less than or equal to the count of the nodes. The node is then constructed by creating a new Position attribute with the value from the sequence, appending all the existing node attributes by using the syntax /@* and then adding the original node by using the node() function.

The result of the query looks like this:

<Order Id="1234">
  <Items>
    <Item Postition="1" PartId="A1111" Qty="1">
      <Description>Motherboard</Description>
    </Item>
    <Item Postition="2" PartId="B2222" Qty="2">
      <Description>Graphics Card</Description>
    </Item>
    <Item Postition="3" PartId="C3333" Qty="4">
      <Description>2gb Ram</Description>
    </Item>
    <Item Postition="4" PartId="D4444" Qty="1">
      <Description>700w PSU</Description>
    </Item>
    <Item Postition="5" PartId="E5555" Qty="2">
      <Description>300gb SSD</Description>
    </Item>
  </Items>
</Order>

The xml now contains a Position attribute for each node. The query is also such that I could potentially add new attributes to or or add new child nodes under and the query will automatically bring them through without any change to the query itself. This is thanks to node() and /@*.

Now don’t get me wrong, I’m not suggesting that this is great code by a long way! The sequence is quite hacky and unless you set it correctly, you will always run the risk of the number of nodes exceeding the maximum size of the sequence. However, if you understand the risks of such a query or if you have some kind of upper limit of the number of nodes you are retrieving the position for, then this could be considered alongside other methods. But as I always say, you need to evaluate all options and then test those options to see what works best within your infrastructure.

Enjoy!