In SQL Server, what is the best way to determine if a given string is a valid XML or not?

16,643

Solution 1

I would like to parse the XML data and extract the actual message.

Perhaps it is not necessary to check for valid XML. You could check for the presence of the appropriate xml tag with charindex in a case statement and extract the error message using substring.

Here is a sample with a simplified XML string but I think you get the idea.

declare @T table(ID int, Col1 nvarchar(1000))

insert into @T values
(1, 'No xml value 1'),
(2, 'No xml value 2'),
(3, '<root><item>Text value in xml</item></root>')

select
  case when charindex('<item>', Col1) = 0
  then Col1
  else
    substring(Col1, charindex('<item>', Col1)+6, charindex('</item>', Col1)-charindex('<item>', Col1)-6)
  end  
from @T

Result:

No xml value 1
No xml value 2
Text value in xml

Solution 2

Based on the accepted answer, I created this to check for valid XML and optionally convert the input string to XML (or extract the required elements / attributes from the XML), because I figured out that TRY_CONVERT works succesfully in case you just pass in plain text, what I did not expect, so another check was required to prevent the final cast to XML I need to work in case the source column just holds some text (sample row 1):

declare @T table(ID int, Col1 nvarchar(1000))
insert into @T values
(1, 'random text value 1'),
(2, '<broken> or invalid xml value 2'),
(3, '<root><item>valid xml</item></root>')
select id, Col1,
 Converted_XML = CASE 
        when [Col1] IS NULL THEN NULL                   /* NULL stays NULL */ 
        when TRY_CONVERT(xml, [Col1]) is null THEN NULL /* Xml Document Error */
        when CHARINDEX('<', [Col1]) < 1 AND CHARINDEX('>', [Col1]) < 1 THEN NULL        /* no xml */
        else CONVERT(xml, [Col1])                       /* Parsing succesful. => in this case you can convert string to XML and/or extract the values */
    END,
    Result_Comment = CASE 
        when [Col1] IS NULL THEN 'NULL always stays NULL'
        when TRY_CONVERT(xml, [Col1]) is null THEN 'Xml Document Error'
        when CHARINDEX('<', [Col1]) < 1 AND CHARINDEX('>', [Col1]) < 1 THEN 'no xml'
        else [Col1]
    END
FROM @T ;

Solution 3

You could create an XML schema and use it to validate against the XML strings.

See here for additional info : http://msdn.microsoft.com/en-us/library/ms176009.aspx

Here's an example:

CREATE XML SCHEMA COLLECTION UserSchemaCollection AS 
N'<?xml version="1.0" encoding="UTF-16"?>
  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:element name = "User" >
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name = "UserID" />
                <xsd:element name = "UserName" />
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
  </xsd:schema>';


DECLARE @x XML(UserSchemaCollection)
SELECT @x = '<User><UserID>1234</UserID><UserName>Sebastian</UserName></User>'

Examples:

DECLARE @y XML(UserSchemaCollection)
SELECT @y = '<User><UserName>Sebastian</UserName></User>'

Msg 6965, Level 16, State 1, Line 2 XML Validation: Invalid content. Expected element(s):UserID where element 'UserName' was specified. Location: /:User[1]/:UserName[1]

DECLARE @z XML(UserSchemaCollection)
SELECT @z = 'Some text'

Msg 6909, Level 16, State 1, Line 2 XML Validation: Text node is not allowed at this location, the type was defined with element only content or with simple content. Location: /

Solution 4

I don't know about a best way, but here's a way:

DECLARE @table TABLE (myXML XML)

INSERT INTO @table
SELECT  
'
    <Employee>
        <FirstName>Henry</FirstName>
        <LastName>Ford</LastName>
    </Employee>
'

SELECT myXML 
FROM @table 
FOR XML RAW

If the XML is invalid it will throw an error:

DECLARE @table TABLE (myXML XML)

INSERT INTO @table
SELECT  
'
    <Employee
        <FirstName>Henry</FirstName>
        <LastName>Ford</LastName>
    </Employee>
'

SELECT myXML 
FROM @table 
FOR XML RAW

Just for clarification, all you have to do is cast it:

BEGIN TRY
    DECLARE @myXML XML
    SET @myXML = CAST
    ('
        <Employee>
            <FirstName>Henry</FirstName>
            <LastName>Ford</LastName>
        </Employee>
    ' AS XML)
    SELECT 'VALID XML'
END TRY
BEGIN CATCH
    SELECT 'INVALID XML'
END CATCH;

vs

BEGIN TRY
    DECLARE @myXML XML
    SET @myXML = CAST
    ('
        <Employee
            <FirstName>Henry</FirstName>
            <LastName>Ford</LastName>
        </Employee>
    ' AS XML)
    SELECT 'VALID XML'
END TRY
BEGIN CATCH
    SELECT 'INVALID XML'
END CATCH;
Share:
16,643

Related videos on Youtube

Venkat
Author by

Venkat

Updated on May 03, 2020

Comments

  • Venkat
    Venkat about 4 years

    A third party component is filling up an nvarchar column in a table with some values. Most of the time it is a human-readable string, but occassionally it is XML (in case of some inner exceptions in the 3rd party comp).

    As a temporary solution (until they fix it and use string always), I would like to parse the XML data and extract the actual message.

    Environment: SQL Server 2005; strings are always less than 1K in size; there could be a few thousand rows in this table.


    I came across a couple of solutions, but I'm not sure if they are good enough:

    1. Invoke sp_xml_preparedocument stored proc and wrap it around TRY/CATCH block. Check for the return value/handle.
    2. Write managed code (in C#), again exception handling and see if it is a valid string.

    None of these methods seem efficient. I was looking for somethig similar to ISNUMERIC(): an ISXML() function. Is there any other better way of checking the string?

    • Martin Smith
      Martin Smith about 13 years
      If the string doesn't begin with < then presumably you know it is definitely not XML.
    • Venkat
      Venkat about 13 years
      @rene: I meant "efficient" :) The above 2 solutions does not sound scalable if the number of rows increases substantially.
  • Martin Smith
    Martin Smith about 13 years
    How would this apply to the OP's case where they have a load of strings that may or may not be XML and they only want to cast as XML the ones which are valid to cast? select cast('foo' as xml) is not xml but casts without error. select cast('foo < bar' as xml) causes an error.
  • SQLMason
    SQLMason about 13 years
    I modified it so make it more like a function. You can make your isXML function accept a varchar and return if it's valid XML or not. You can then do what you'd like to the valid XML.
  • Martin Smith
    Martin Smith about 13 years
    You can't use try..catch in functions.
  • SQLMason
    SQLMason about 13 years
    Procedure which returns a value, sorry.
  • SQLMason
    SQLMason about 13 years
    Doesn't look like there's a good way to do with without RBARing it.
  • SQLMason
    SQLMason about 13 years
    He was just checking for valid XML, not against a scheme. Else your answer is the same as mine.
  • Venkat
    Venkat about 13 years
    Txs! This might not be solution for all such scenarios, but looks OK for my case, as a simple temporary fix! Plus I will try use it as a simple inline SQL rather than wrap it in a function. Gonna give it a try :)
  • Venkat
    Venkat about 13 years
    txs; But using a function/SP AND error handling doesn't sound scalable! This table is used in a join with about a dozen tables. I prefer some simple inline XML. Gonna try Mikael Eriksson's solution.
  • Charles Byrne
    Charles Byrne almost 5 years
    Good Solution. Just a caveat that sometimes SQL will interpret strings with invalid xml as mixed content. So if the parent of item was <valid> but the starting bracket '<' was omitted along with entire end tag it would assume mixed content: valild><item>valid xml</item> = valid&gt;<item>xml</item> Just something to be mindful of.