Using OPENXML in SQL Server 2008 stored proc - INSERT order differs from XML document

17,168

Solution 1

If I use the native XQuery support in SQL Server instead of the "legacy" OPENXML stuff, then it would appear that the <t> nodes are indeed inserted into the table in the order they appear in the XML document.

I've used code something like this:

INSERT INTO dbo.[Transactions]([ID], [EncryptedAccountID])
   SELECT
        XT.value('@id', 'uniqueidentifier'),
        XT.value('@encryptedAccountId', 'varchar(200)')
   FROM
      @xmlTransaction.nodes('/ts/t') AS Nodes(XT)

The same could be done for the <tv> subnodes, too.

Solution 2

As far as I can see, the documentation on OPENXML does not guarantee anything about order. And order is not guaranteed in a relational table either. So why not just "order by" a certain column to get the order that you wish? That is always how you enforce ordering in sql.

I don't see why you're extracting the encryptedAccountId attribute separately. Why not just insert it in the maininsert statement?

Unrelated tip, if your transaction insert is generating an identity, you can snag a copy of it for your values insert by joining to that parent table on '../@id'. Even if you don't need anything else from the parent table, it seems like a good idea, to make sure that none of your transaction inserts failed.

Tip example:

 INSERT INTO
            [TransactionValues]
            (
                FieldID,
                TransactionID,
                OfficialValue,
                FriendlyValue,
                ParentIdentityFK --new
            )
            SELECT
                shredded.rFieldID, shredded.rTransactionID, shredded.rOfficialValue, shredded.rFriendlyValue, t.SomeIdentity
            FROM
                OPENXML (@Handle, '/ts/t/tv', 2)                
            WITH
            (
                rFieldID INT '@fieldId',
                rTransactionID UNIQUEIDENTIFIER '../@id',
                rOfficialValue NVARCHAR (500) '@officialValue',
                rFriendlyValue NVARCHAR (500) '@friendlyValue'
            ) as shredded
            join Transactions t
                on shredded.rTransactionID = t.ID
Share:
17,168
awj
Author by

awj

Updated on June 04, 2022

Comments

  • awj
    awj almost 2 years

    I'm using SQL Server 2008's XML-parsing abilities to iterate through an XML document and perform an INSERT each element.

    However, my stored procedure appears to be inserting each element into the table in an order which differs from the order in the document.

    Furthermore, the more times I try this, the INSERT order seems to change.

    Here's a sample of the XML document - nothing too fancy going on here.

    <ts>
        <t id="36a3c8c1-b958-42f0-82d1-dfa6bf9b99a1" encryptedAccountId="fQ/XF8lpeR9wEDUV3yMzvQ==" uploaded="2012-04-03T15:49:19.9615097Z" visible="1">
            <tv fieldId="301" officialValue="0, 0" friendlyValue="0, 0" />
            <tv fieldId="302" officialValue="0, 1" friendlyValue="0, 1" />
            <tv fieldId="303" officialValue="0, 2" friendlyValue="0, 2" />
            <tv fieldId="304" officialValue="0, 3" friendlyValue="0, 3" />
            <tv fieldId="305" officialValue="0, 4" friendlyValue="0, 4" />
            <tv fieldId="306" officialValue="0, 5" friendlyValue="0, 5" />
        </t>
        <t id="9d56d082-4b6a-4bdf-a7a2-f5c6af88344e" encryptedAccountId="fQ/XF8lpeR9wEDUV3yMzvQ==" uploaded="2012-04-03T15:49:19.9615097Z"  visible="1">
            <tv fieldId="301" officialValue="1, 0" friendlyValue="1, 0" />
            <tv fieldId="302" officialValue="1, 1" friendlyValue="1, 1" />
            <tv fieldId="303" officialValue="1, 2" friendlyValue="1, 2" />
            <tv fieldId="304" officialValue="1, 3" friendlyValue="1, 3" />
            <tv fieldId="305" officialValue="1, 4" friendlyValue="1, 4" />
            <tv fieldId="306" officialValue="1, 5" friendlyValue="1, 5" />
        </t>
        <t id="27db47a3-ad3f-4279-8f4f-0a8944ce32d4" encryptedAccountId="fQ/XF8lpeR9wEDUV3yMzvQ==" uploaded="2012-04-03T15:49:19.9615097Z" visible="1">
            <tv fieldId="301" officialValue="2, 0" friendlyValue="2, 0" />
            <tv fieldId="302" officialValue="2, 1" friendlyValue="2, 1" />
            <tv fieldId="303" officialValue="2, 2" friendlyValue="2, 2" />
            <tv fieldId="304" officialValue="2, 3" friendlyValue="2, 3" />
            <tv fieldId="305" officialValue="2, 4" friendlyValue="2, 4" />
            <tv fieldId="306" officialValue="2, 5" friendlyValue="2, 5" />
        </t>
        <t id="867ea26b-0341-4d60-ac48-f305492a60f0" encryptedAccountId="fQ/XF8lpeR9wEDUV3yMzvQ==" uploaded="2012-04-03T15:49:19.9615097Z" visible="1">
            <tv fieldId="301" officialValue="3, 0" friendlyValue="3, 0" />
            <tv fieldId="302" officialValue="3, 1" friendlyValue="3, 1" />
            <tv fieldId="303" officialValue="3, 2" friendlyValue="3, 2" />
            <tv fieldId="304" officialValue="3, 3" friendlyValue="3, 3" />
            <tv fieldId="305" officialValue="3, 4" friendlyValue="3, 4" />
            <tv fieldId="306" officialValue="3, 5" friendlyValue="3, 5" />
        </t>
    </ts>
    

    The stored procedure has a few operations taking place, but I've commented-out other parts leaving only the SQL which inserts the <t/> elements and then the <tv/> elements.

    The SQL in the stored procedure is as follows.

    (@xmlTransaction is an NVARCHAR (MAX) input param containing the above XML)

    BEGIN
        SET NOCOUNT ON;
    
        DECLARE @encryptedAccountID AS VARCHAR(200)
    
        BEGIN TRANSACTION
            BEGIN TRY
                DECLARE @Handle AS INT
                DECLARE @TransactionCount AS INT
    
                EXEC sp_xml_preparedocument @Handle OUTPUT, @xmlTransaction
    
                /* encryptedAccountId is always the same for each @xmlTransaction param */
                /* Just take the value from the first <t/> element */
                SET @encryptedAccountID = (SELECT eID FROM OPENXML (@Handle, '/ts/t[1]', 2) WITH ( eID VARCHAR '@encryptedAccountId' ))
    
                /* Go through each <t/> element in the XML document and INSERT */
                INSERT INTO
                [Transactions] 
                (
                    [ID],
                    [EncryptedAccountID]
                )
                SELECT
                    *
                FROM
                    OPENXML (@Handle, '/ts/t', 2)
                WITH
                (
                    rID UNIQUEIDENTIFIER '@id',
                    rEncryptedAccountID VARCHAR (200) '@encryptedAccountId'
                )
    
                /* Loop through each TransactionValue in the XML document and INSERT */
                INSERT INTO
                [TransactionValues]
                (
                    FieldID,
                    TransactionID,
                    OfficialValue,
                    FriendlyValue
                )
                SELECT
                    *
                FROM
                    OPENXML (@Handle, '/ts/t/tv', 2)
                WITH
                (
                    rFieldID INT '@fieldId',
                    rTransactionID UNIQUEIDENTIFIER '../@id',
                    rOfficialValue NVARCHAR (500) '@officialValue',
                    rFriendlyValue NVARCHAR (500) '@friendlyValue'
                )
    
                /* Dispose of the XML document */
                EXEC sp_xml_removedocument @Handle
    
            COMMIT TRANSACTION
        END TRY
        BEGIN CATCH
    
            RETURN @@ERROR
    
            ROLLBACK TRANSACTION        
        END CATCH
    
    END
    

    Should be fairly straightforward. And yet if I query the results, they're not in the same order as the XML document. The second INSERT statement for the <tv/> elements does store the elements into a second table in the correct order, but the <t/> elements are not stored in their table in the correct order.

    Can anyone explain to me why the <t/> elements are not being INSERTed into the table in the same order as they appear in the XML document?

  • awj
    awj about 12 years
    (1) Are you suggesting that I send in an ordered parameter inside each element, and then query ordered by that? If so then I suppose I could though it does add an additional complexity that I was hoping I wouldn't need. Otherwise I'm not sure I get what you're suggesting. (2) As for the encryptedAccountId, I didn't see the point in passing in an additional param when it already appears (for other reasons) in the XML param. (3) As for the Values getting an Identity value from the parent, this is the case and couldn't find a way to do this. I'll definitely follow up on that suggestion.
  • awj
    awj about 12 years
    I'm completely ignorant of this XQuery in SQL Server and thought that OPENXML was the only way to achieve this. I will now investigate XQuery and report back.
  • awj
    awj about 12 years
    I've spent an hour trying to get this to work, but can't. I'm unable to find any tutorials or guides with syntax like yours - all pages I can find relate to how to modify existing XML fields. I tried your SQL but get the message "The XMLDT method 'nodes' can only be invoked on columns of type xml." Surely I don't have to change my field types to insert this data? Especially as you've still declared the types for 'id' and 'encryptedAccountId' [not allowed to use the 'at' symbol in the comments]. Here's a screenshot of what I tried to do: screencast.com/t/CToJqYqH55RH
  • Brian White
    Brian White about 12 years
    In my performance testing, openxml beat xquery by a huge margin. Depends on your document. EVERY column you query with xquery creates a table, that is then inner joined to every other column to give you a table. OpenXML is pickier about special characters. I only use xquery when i have annoying data like names with ' and &.
  • Brian White
    Brian White about 12 years
    To the 'can't work comment' - you need to use xml datatype, not nvarchar(max)
  • awj
    awj about 12 years
    Hmmm... So let's suppose OPENXML is the most optimal way forward, is the solution as simple as placing something like ORDER BY '@orderBy' statement inside the sub-query, following the WITH statement? And if so, is this what you meant in your first reply?
  • awj
    awj about 12 years
    Thanks marc_s - how stupid of me. It works now. But can the three of us come to some agreement over the most efficient way to do this? I seem to have the start of the XQuery solution, whereas @Brian is suggesting the alternate solution.
  • marc_s
    marc_s about 12 years
    @awj: first of all: do you want a fast solution - or one that works?? Is performance the only point of concern?? And second: I've heard those stories, too - question always is: does this really apply to your case, too? If you're interested: do your own testing and see for yourself! If OpenXML is that much faster - well, then go with that. I prefer the native XQuery support which seems a lot easier, a lot more intuitive to me. Also: OpenXML might be faster in certain cases - but it's also more memory intensive. So it depends on your situation which is better - test it! Decide for yourself
  • Brian White
    Brian White about 12 years
    In sql there normally isn't any order guaranteed. You have to supply it. You can 'order by' a new attribute you add and then parse in the openxml statement. The only reason that order is maintained when using xquery instead of openxml, is that every column you parse is turned into a table, that it then inner joined to the other tables. So they have to be in the same order as each other for the resulting joined table to have the same data order as the document.
  • Brian White
    Brian White about 12 years
    Yes, do your own test for sure. My test was on a document about of about 150 records and 30 columns. In that case, the xQuery approach took 100% of the cost when I ran the two test procs in the same batch to compare. But I have several jobs running in prod using xQuery to transfer data. They run fine because the documents they deal with have only a few columns. The number of columns determines the number of table valued functions you join together. Joining 5 tables is fine. Joining 30 tables sucks. Same with the table valued functions for xQuery shredding.
  • Brian White
    Brian White about 12 years
    Also... do you really care about the order of the document? Your imported pk value is a uniqueidentifier so you don't seem to need to preserve the ordering of an int identity pk. I do not have confirmation of this, but I think it's likely if you were shredding a document with an int pk you would probably see the document in order. I always have, but I never use uniqueidentifier. I think sql is sorting by the first columns. Is there a defined way the sorting should happen on a GUID? link
  • awj
    awj about 12 years
    Ok, I've done some testing and benchmarking, and for what I'm using it for, both are almost the same. This involved inserting 1000 records, each with 5 fields. There are no joins taking place, and perhaps because of this, both the OPENXML and the XQuery methods were virtually the same. However, due to the terseness of the XQuery method, and because it's easier to take in at a glance, I'm going to mark this is the solution. I've also found that it's easier to add sub-queries compared to the OPENXML method.