Introduction to XML DML (Part 1) - Using the modify() method: insert

This is the first part of a 3 part blog which will attempt to show some different ways of how you can use the xml modify() method to update the contents of an XML document. This first part, starting at the absolute basics, will cover the ability to insert data into an existing XML document

XML DML is an extension of the XQuery language that allows us to do in place updates of the contents of an XML document. XML DML is very flexible and most scenarios can be dealt with quite nicely with the use of the modify() function. We can however only use the modify() function as part of an UPDATE or SET statement.

Here is a very simple example XML document that we will be using with all the example queries during this article which can downloaded with all of the example queries here.

--Declare/assign our example xml blob
    DECLARE @XMLData XML
    SET @XMLData = '
    <Person Id="1234">
        <Surname>Smith</Surname>
        <Forenames>John Peter</Forenames>
        <Address>
            <AddressLine>1 Church Lane</AddressLine>
            <AddressLine>Littlewood</AddressLine>
            <AddressLine>Upper Westshire</AddressLine>
            <AddressLine>England</AddressLine>
        </Address>    
        <HasDrivingLicense/>
    </Person>'

The Basics

In its simplest form, the insert expression allows us to insert one or more nodes as defined by an expression into a location within the XML document. The location can be specified by using the term "into", "after" or "before". Using "into" also allows the use of the prefix "as first" or "as last". MSDN detail the syntax as:

insert  
          Expression1 (
                     {as first | as last} into | after | before
                                        Expression2
                    )

To demonstrate this we will add a new <Postcode> node as the last child node of /Person/Address by using the term as last into. MSDN states that you must use either as first or as last to specify where you want the node added. This is true in the sense that you need to specify as first to ensure that Expression1 is the first child of Expression2. However, I have found that strictly you don't seem to need to specify as last as I've seen that if you only specify into then this will default to adding Expression1 as the last child of Expression2. But I would recommend using as last just in case this behavior ever changes.

SET @XMLData.modify('  
        insert         
            (<Postcode>3FG 2MP</Postcode>)
        as last into
            (/Person[1]/Address[1])
    ')

    SELECT @XMLData

Returns:

<Person Id="1234">  
      <Surname>Smith</Surname>
      <Forenames>John Peter</Forenames>
      <Address>
        <AddressLine>1 Church Lane</AddressLine>
        <AddressLine>Littlewood</AddressLine>
        <AddressLine>Upper Westshire</AddressLine>
        <AddressLine>England</AddressLine>
        <Postcode>3FG 2MP</Postcode>
      </Address>  
    </Person>

This next example builds on the XML from the last example by using the term after and will add a new <DateOfBirth> node after the <Forenames> node

SET @XMLData.modify('  
        insert         
            (<DateOfBirth>1950-06-01</DateOfBirth>)
        after
            (/Person[1]/Forenames[1])
    ')

    SELECT @XMLData

Returns:

<Person Id="1234">  
      <Surname>Smith</Surname>
      <Forenames>John Peter</Forenames>
      <DateOfBirth>1950-06-01</DateOfBirth>
      <Address>
        <AddressLine>1 Church Lane</AddressLine>
        <AddressLine>Littlewood</AddressLine>
        <AddressLine>Upper Westshire</AddressLine>
        <AddressLine>England</AddressLine>
        <Postcode>3FG 2MP</Postcode>
      </Address>
    </Person>

Finally, building on the previous example, this will use the term before and add a new <Gender> node before the just added <DateOfBirth> node

SET @XMLData.modify('  
        insert         
            (<Gender>Male</Gender>)
        before
            (/Person[1]/DateOfBirth[1])
    ')

    SELECT @XMLData

Returns:

<Person Id="1234">  
      <Surname>Smith</Surname>
      <Forenames>John Peter</Forenames>
      <Gender>Male</Gender>
      <DateOfBirth>1950-06-01</DateOfBirth>
      <Address>
        <AddressLine>1 Church Lane</AddressLine>
        <AddressLine>Littlewood</AddressLine>
        <AddressLine>Upper Westshire</AddressLine>
        <AddressLine>England</AddressLine>
        <Postcode>3FG 2MP</Postcode>
      </Address>
    </Person>

Now these are very simplistic implementations and you would naturally be looking to combine these statements where you can. For example, the <Gender> and <DateOfBirth> nodes are together so ideally we should add them via a single statement. This is possible to do and it is called a sequence. To insert a sequence of nodes, the expression being inserted must be encased within parentheses and separated by commas. The last two examples could have been combined and written as:

SET @XMLData.modify('  
        insert         
            (
                <Gender>Male</Gender>,
                <DateOfBirth>1950-06-01</DateOfBirth>
            )
        after
            (/Person[1]/Forenames[1])
    ')

Incidentally, you do not need to encase the expression being inserted within parentheses if you are not inserting a sequence of nodes, i.e. in the first three examples. However, I personally always encase the expression in parentheses as I believe it provides slightly more readable and consistent code than only using them when necessary. Similar the target location expression does not need parentheses either, but again, I tend to prefer using them to aid readability.

Using Functions

So far we have been inserting fixed nodes and values into the XML document which is fine but not exactly reflective of production systems. Normally you would want to use the contents of a variable or a column or manipulate the data in some way. SQL provides two functions for accessing data outside of the XML document via the modify() function. They are sql:variable and sql:column. They both work in similar ways and when referenced within the XQuery, the values can be used as part of the DML statement. For example, the previous example could have been written as:

DECLARE @Gender VARCHAR(6)  
    DECLARE @DoB DATETIME

    SELECT @Gender = 'Male', @DoB = '1950-06-01'

    SET @XMLData.modify('
        insert         
            (
                <Gender>{ sql:variable("@Gender") }</Gender>,
                <DateOfBirth>{ substring(string(sql:variable("@DoB")), 1, 10) }</DateOfBirth>
            )
        after
            (/Person[1]/Forenames[1])
    ')

The example uses a couple of XQuery functions as part of the insert, substring() and string(), and a full list of these functions are available on MSDN here. The XQuery functions and the sql:variable function are encased within curly braces and this is because we need the values to be computed. The curly braces ensure that whatever is in the braces is computed first and the result of that computation is passed up. Without them, the value would not be computed and whatever is there would be seen as a string literal instead. Even though SQL has not implemented all the XQuery/XPath functions, there are still quite a number of useful functions that can be used within the XQuery to both manipulate the data or within a test. By using sql:variable or sql:column, we are also able to insert entire XML structures straight into the XML document that we are modifying. e.g.

DECLARE @PreviousAddress XML  
    SET @PreviousAddress = '
    <PreviousAddress>
        <AddressLine>10 Lakes Lane</AddressLine>
        <AddressLine>Largehill</AddressLine>
        <AddressLine>Lower Eastford</AddressLine>
        <AddressLine>England</AddressLine>
    </PreviousAddress>'

    SET @XMLData.modify('
        insert      
            (
                sql:variable("@PreviousAddress")
            )
        after
            (/Person[1]/Address[1])
    ')

    SELECT @XMLData

Returns:

<Person Id="1234">  
      <Surname>Smith</Surname>
      <Forenames>John Peter</Forenames>
      <Gender>Male</Gender>
      <DateOfBirth>1950-06-01</DateOfBirth>
      <Address>
        <AddressLine>1 Church Lane</AddressLine>
        <AddressLine>Littlewood</AddressLine>
        <AddressLine>Upper Westshire</AddressLine>
        <AddressLine>England</AddressLine>
        <Postcode>3FG 2MP</Postcode>
      </Address>
      <PreviousAddress>
        <AddressLine>10 Lakes Lane</AddressLine>
        <AddressLine>Largehill</AddressLine>
        <AddressLine>Lower Eastford</AddressLine>
        <AddressLine>England</AddressLine>
      </PreviousAddress>
    </Person>

Referencing Existing Data in XML Document

The insert expression can also reference existing data in the XML document. For example, the following creates a new <FullName> node that concatenates two existing nodes by selecting the values as a sequence. Notice the use of the data() function and curly braces so that the value is computed. The data() function returns the value of the XPath expression. Without the data() function, the XPath expression would return the full node:

SET @XMLData.modify('  
        insert         
            (
                <FullName>
                    { data(/Person[1]/Forenames[1]), data(/Person[1]/Surname[1]) }
                </FullName>
            )
        as first into
            (/Person[1])
    ')

    SELECT @XMLData

Returns (truncated):

<Person Id="1234">  
      <FullName>John Peter Smith</FullName>
      <Surname>Smith</Surname>
      <Forenames>John Peter</Forenames>
      <Gender>Male</Gender>
      <DateOfBirth>1950-06-01</DateOfBirth>
      .......
    </Person>

When you supply a list of values as a sequence in the expression, the values will be separated with a space. The previous example provided two values supplied by 2 XPath queries, namely the <Forenames> and <Surname> in the sequence, but the same will also apply if you used a single XPath query returning multiple values. For example, this will concatenate each of the 4 <AddressLine> nodes into one new node called FullAddress:

SET @XMLData.modify('  
        insert         
            (
                <FullAddress>
                    { data(/Person[1]/Address[1]/AddressLine) }
                </FullAddress>
            )
        as first into
            (/Person[1])
    ')

    SELECT @XMLData

Returns (truncated):

<Person Id="1234">  
      <FullAddress>1 Church Lane Littlewood Upper Westshire England</FullAddress>
      <FullName>John Peter Smith</FullName>
      <Surname>Smith</Surname>
      <Forenames>John Peter</Forenames>
      <Gender>Male</Gender>
      <DateOfBirth>1950-06-01</DateOfBirth>
      ........
    </Person>

Although you can reference data that is already within the XML document and you can insert entire XML structures by referencing a variable or column, there isn't any easy way (to my knowledge) of referencing another XML document to retrieve values from. SQL doesn't provide for a doc() or equivalent function that can access an external XML document. The only way around that I am aware of is by shredding the XML to variables/columns and then referencing them within the DML statement. Not exactly ideal though.

Attributes

So far we have only inserted XML nodes into the XML document and not actually inserted any new attributes. We can add attributes in a similar way by specifying the attribute keyword, but we can only use the term into. Using the keyword after or before will result in an error similar to: The position may not be specified when inserting an attribute node. This is because the target of a before or after must be an element/PI/comment or text node

MSDN also states that as first and as last keywords are ignored when inserting attributes but I do not see this and in the queries that I run, attributes can be added as either the first or last attribute depending on which keyword I use. This example will insert a new attribute called HasRegistered into the top level node.

SET @XMLData.modify('  
        insert      
            (
                attribute HasRegistered {"True"}
            )
        as first into
            (/Person[1])
    ')

    SELECT @XMLData

Returns (truncated):

<Person HasRegistered="True" Id="1234">  
      <FullAddress>1 Church Lane Littlewood Upper Westshire England</FullAddress>
      <FullName>John Peter Smith</FullName>
      ........
    </Person>

The same rules apply regarding sequences of attributes as well so you could add multiple attributes into the same node as part of the same insert statement by separating them with commas. The XQuery syntax also provides the ability to construct elements by using the keyword "element" instead. e.g.

SET @XMLData.modify('  
        insert      
            (
                element IsActiveUser {"False"}
            )
        as first into
            (/Person[1])
    ')

    SELECT @XMLData

Text Nodes

It is possible to insert a text node into an existing node by using the into term and depending on you specifying as first or as last then the new text node will be inserted before or after any existing text or nodes in the element you are inserting the text into. Unfortunately, you can't do this for attributes as the target of an insert into must be an element or document node. Also, something to bear in mind that if you are using typed XML, then the target node must be a complex type with the "mixed" attribute set to "true" as inserting into simple types is not allowed. If you was to try and insert a text node into a simple type of a typed XML document then you would probably see an error like this:

XML Validation: Invalid simple type operation, inserting into simple type is not permitted

To insert a value into a simple type node that already exists in the XML document, then you should be using the replace value of keyword instead. But for untyped XML Documents, you can actually use insert to add text to a simple element type. As this article is working with an untyped XML document, then this following example will work and will add a text node into the empty <HasDrivingLicense> element. Although the example is just creating a single text node from a simple string literal, you could use variables, columns or functions when creating the text node. Again, you can also create a sequence of text{} nodes as well by separating each text{} with a comma.

SET @XMLData.modify('  
        insert      
            (
                text{"True"}
            )
        as first into
            (/Person[1]/HasDrivingLicense[1])
    ')

    SELECT @XMLData

Returns (truncated:

<Person HasRegistered="True" Id="1234">  
      .......
      <HasDrivingLicense>True</HasDrivingLicense>
    </Person>

Conditional Statements

Finally another feature within the DML is the ability to use conditional statements, namely if...then...else. The syntax of if...then...else requires you to always specify all three parts. If you do not have an expression for the else part of the statement, then the expression can be left empty by just simply specifying empty parentheses () as per the next example which checks the year of birth and if earlier than 1960 then it will create a new attribute called YearOfBirth1. The example uses the XQuery function substring to get the year of birth and then converts it to an int type by using the function xs:integer.

SET @XMLData.modify('  
        insert      
            (
                if (xs:integer(substring(/Person[1]/DateOfBirth[1], 1, 4)) < 1960)
                then
                    attribute YearOfBirth1 { substring(/Person[1]/DateOfBirth[1], 1, 4) }
                else
                    ()
            )
        as last into
            (/Person[1])
    ')

    SELECT @XMLData

There will be many times where an if...then statement that doesn't use an else could be written by just using predicates. For example the next query is functionally the same as the one above but it uses a predicate instead of an if...then...else statement. When the predicate is evaluated, as there are no matching nodes, no insert is performed. If ever the target expression evaluates to an empty sequence then nothing will be inserted and no error will be raised. You should also be aware though that if the target expression evaluates to more than one node, then an error will be raised.

SET @XMLData.modify('  
        insert      
            (
                attribute YearOfBirth2 { substring(/Person[1]/DateOfBirth[1], 1, 4) }
            )
        as last into
            (/Person[1][xs:integer(substring(DateOfBirth[1], 1, 4)) < 1960])
    ')

    SELECT @XMLData

The following example counts the number of address lines and if there are not 5, then it will add a new empty AddressLine element. If there are 5 address lines, it will instead add a new attribute.

SET @XMLData.modify('  
        insert  
            if (count(/Person[1]/Address[1]/AddressLine) = 5)
            then
                attribute AddressComplete {"True"}
            else
                element AddressLine {""}
        as last into
            (/Person[1]/Address[1])
    ')

    SELECT @XMLData

That concludes this part which I hope has shown how to use the insert keyword in a variety of ways. I haven't used predicates that much in the examples as I wanted to keep them as easy to read as possible, but predicates provide the abiliity for some really complex XPath expressions. Also, it is possible to insert comment nodes, PI nodes and CDATA nodes as well, but I have chosen to leave them out of this article to prevent it from getting too long!

As mentioned earlier, XML DML is quite flexible and although it does have a few limitations and not implemented all of the XQuery/XPath functions, it can deal with most requirements to modify XML documents and I hope this introduction has shown that.

Enjoy!