XML parse VBA excel (function trip, & MSXML2.DOMDocument)

16,913

Solution 1

As close as possible to your OP

I 'd draw your attention to several errors or misunderstandings:

  • [1] Invalid .LoadXML Syntax

What is then the difference between .LoadXML ("C:\folder\folder\name.xml") and .Load ("C:\folder\folder\name.xml") ?

Load expects a file path and then loads the file content into the oXML object.

LoadXML doesn't expect a file parameter, but its actual XML text content that has to be a well formed string.

  • [2] XML distinguishes between lower and upper case, therefore nodes need to be addressed by their exact literal names: the <Query> node wouldn't be identified by "query", "ConceptModel" isn't the same as "conceptmodel".

As second issue I would like to ask if Dim oXml As MSXML2.DOMDocument would be the same as Dim oXml As MSXML2.DOMDocument60, since I checked in tools/references "Microsof XML, v6.0"?

No, it isn't. - Please note that the former declaration would load version 3.0 by default. However it's absolutely preferrable to get the version 6.0 (any other versions are obsolete nowadays!)

As you are using so called early binding (referencing "Microsoft XML, v6.0"), I'll do the same but am referring to the current version 6.0:

Dim oXml As MSXML2.DOMDocument60        ' declare the xml doc object
Set oXml = New MSXML2.DOMDocument60     ' set an instance of it to memory
  • [3] misunderstanding some XPath expressions

A starting slash "/" in the XPath expression always refers to the DocumentElement (<Concepts> here), you can add .DocumentElement to your document object instead. A starting double slash "//xyz" would find any "xyz" node if existant.

For instance

    oXml.SelectNodes("//Query").Length 

returns the same childNodes number (here: 3) as

    oXml.DocumentElement.SelectNodes("//Query").Length   ' or 
    oXml.SelectSingleNode("//Queries").ChildNodes.Length ' or even       
    oXml.SelectNodes("/*/*/*/Query").Length`.

Code example with reference to XML version 6.0

Of course you'd have to loop over several xml files, the example only uses one (starting in row 2).

Just for the case of not well formed xml files I added a detailled error Routine that enables you to identify the presumed error location. Load and LoadXML both return a boolean value (True if loaded correctly, False if not).

Sub xmlTest()

Dim ws   As Worksheet: Set ws = ThisWorkbook.Sheets(3)
Dim oXml As MSXML2.DOMDocument60
Set oXml = New MSXML2.DOMDocument60
With oXml
    .validateOnParse = True
    .setProperty "SelectionLanguage", "XPath"   ' necessary in version 3.0, possibly redundant here
    .async = False

    If Not .Load(ThisWorkbook.Path & "\xml\" & "name.xml") Then
        Dim xPE        As Object    ' Set xPE = CreateObject("MSXML2.IXMLDOMParseError")
        Dim strErrText As String
        Set xPE = .parseError
        With xPE
           strErrText = "Load error " & .ErrorCode & " xml file " & vbCrLf & _
           Replace(.URL, "file:///", "") & vbCrLf & vbCrLf & _
          xPE.reason & _
          "Source Text: " & .srcText & vbCrLf & vbCrLf & _
          "Line No.:    " & .Line & vbCrLf & _
          "Line Pos.: " & .linepos & vbCrLf & _
          "File Pos.:  " & .filepos & vbCrLf & vbCrLf
        End With
        MsgBox strErrText, vbExclamation
        Set xPE = Nothing
        Exit Sub
    End If

    ' Debug.Print "|" & oXml.XML & "|"

    Dim Queries  As IXMLDOMNodeList, Query As IXMLDOMNode
    Dim Searched As String
    Dim i&, ii&
    i = 2       ' start row
  ' start XPath  
    Searched = "ConceptModel/Queries/Query"                     ' search string
    Set Queries = oXml.DocumentElement.SelectNodes(Searched)    ' XPath
  ' 
    ws.Cells(i, 1) = IIf(Queries.Length = 0, "No items", Queries.Length & " items")
    ii = 1
    For Each Query In Queries
        ii = ii + 1
        ws.Cells(i, ii) = Query.Text
    Next

End With

End Sub

Additional hints

You also might be interested in an example how to list all child nodes via XMLDOM and to obtain attribute names from XML using VBA.

I include a further hint due to later comment (thanks to @barrowc )

"A further issue with using MSXML, v3.0 is that the default selection language is XSLPatterns instead of XPath. Details on some of the differences between MSXML versions are here and the differences between the two selection languages are discussed here."

In the current MSXML2 version 6.0 XPath 1.0 is fully supported. So it seems XSL Patterns have been implemented by Microsoft in earlier days, basically it can be regarded as a simplified subset of XPath expressions before W3C standardisation of XPath.

MSXML2 Version 3.0 allows the integration of XPath 1.0 at least by explicit selection language setting:

oXML.setProperty "SelectionLanguage", "XPath"   ' oXML being the DOMDocument object as used in original post  

Solution 2

It is the special characters (german alphabet) meaning you need to do something like a batch replace on the XML files so opening line is not this:

<?xml version="1.0" encoding="UTF-8"?>

but this:

<?xml version="1.0" encoding="iso-8859-1" ?>

Code to test with after:

Option Explicit
Public Sub test()
    Dim xmlDoc As Object
    Set xmlDoc = CreateObject("MSXML2.DOMDocument") 'New MSXML2.DOMDocument60
    With xmlDoc
        .validateOnParse = True
        .setProperty "SelectionLanguage", "XPath"
        .async = False
        If Not .Load("C:\Users\User\Desktop\Test.xml") Then
            Err.Raise .parseError.ErrorCode, , .parseError.reason
        End If
    End With
    Debug.Print xmlDoc.SelectNodes("//Query").Length
End Sub

This is the XML I am using:

<?xml version="1.0" encoding="iso-8859-1" ?>
  <Concepts>
      <ConceptModel name="food">
    <Filters>
      <Filter type="CC"/>
    </Filters>
    <Queries>
      <Query lang="EN">(cheese, bread, wine)</Query>
      <Query lang="DE">(Käse, Brot, Wein)</Query>
      <Query lang="FR">(fromaige, pain, vin)</Query>
   </Queries>
  </ConceptModel>
</Concepts>
Share:
16,913
JFerro
Author by

JFerro

NLP with Spacy Pandas Networkx ipywidgets, ipysheets, voila, ipyvuewtify in Jupyter Hub environement NLP for patent literature. Feature extraction. EPC &amp; PCT compliance based on NLP techniques. Interested in Machine learning for information extraction in EN, FR, DE, NL, SP, PT languages.

Updated on June 22, 2022

Comments

  • JFerro
    JFerro almost 2 years

    I need to parse hundreds of XML files having all the same structure as follows:

    <?xml version="1.0" encoding="UTF-8"?>
      <Concepts>
        <ConceptModel name="food">
          <Filters>
            <Filter type="CC"/>
          </Filters>
          <Queries>
            <Query lang="EN">(cheese, bread, wine)</Query>
            <Query lang="DE">(Käse, Brot, Wein)</Query>
            <Query lang="FR">(fromaige, pain, vin)</Query>
          </Queries>
        </ConceptModel>
      </Concepts>
    

    I have read several articles and posts in internet like below but I could not come up with a solution:

    So far I am doing:

    Dim oXml As MSXML2.DOMDocument
    Set oXml = New MSXML2.DOMDocument
    oXml.LoadXML ("C:\folder\folder\name.xml")
    
    Dim Queries As IXMLDOMNodeList
    Dim Query As IXMLDOMNode
    
    ThisWorkbook.Sheets(3).Cells(i, 1) = "before loop"
    
    Set Queries = oXml.SelectNodes("/concepts/Queries")
    
    MsgBox "how many Queries " &  Queries.Length
    
    For Each Query In Queries
        ThisWorkbook.Sheets(3).Cells(i, 1) = "Works"
        ThisWorkbook.Sheets(3).Cells(i, 2) = Query.SelectNodes("Query").iTem(0).Text
        i = i + 1
    Next
    

    This code seems to be understood by VBA but it does not read the contents. The loop does not get read, meaning (I guess) that Queries is not looped at all. Which is confirmed by the fact that the Msgbox "how many queries" gives 0 as result. But actually there are three queries. Could someone give me a hand?

    As second issue I would like to ask if

     Dim oXml As MSXML2.DOMDocument
    

    would be the same as

     Dim oXml As MSXML2.DOMDocument60
    

    Since I checked in tools/references "Microsof XML, v6.0"

    I thought that the queries having a tag might cause a problem. and I added the follwoing lines:

    Dim childs As IXMLDOMNodeList
    Set childs = oXml.SelectNodes("/concepts")
    
    MsgBox "childs " & childs.Length
    

    which also gives 0 as result. I would expect 3, since concepts has three children, namely ConceptModel, Filter and Queries. So, I am even more puzzled.

  • JFerro
    JFerro over 5 years
    but that would mean that if there is no "ü", "ä" or whatever of this stily it should work. And it doesnt. Unfortunately I can not influence the content of the files themselves, I can only read data.
  • QHarr
    QHarr over 5 years
    I'm not sure but I can assure you that it does work with the above change. There maybe other ways to achieve the same such as reading in using a different method first and parsing that to the xmldocument.
  • JFerro
    JFerro over 5 years
    Just for me to understand. I did the following. I opened the xml file in a text editor and changed (as you said) the head into: <?xml version="1.0" encoding="iso-8859-1" ?> save it and run my code again. same result. Is that what you mean?
  • JFerro
    JFerro over 5 years
    Another try: I save an XML file from a text editor without any head specifing encoding and save it with XML extension as text file. (and without german special characteres). Did not work neither
  • JFerro
    JFerro over 5 years
    Now I did. Yours work. Not only that, I change the initial like by the one containing UFT and then gave error.
  • JFerro
    JFerro over 5 years
    So thanks. I think I have to try other methods then, like reading pure text and parsing. I can not change the files since they are read by other programs as well.
  • JFerro
    JFerro over 5 years
    What is then the difference between .LoadXML ("C:\folder\folder\name.xml") and .Load ("C:\folder\folder\name.xml") ??? I pasted my file in "XMLvalidator" and there was no errors, so, what do you mean exactly by: "<< CANNOT load this - NO valid XML string input" thx
  • T.M.
    T.M. over 5 years
    Give me some time, I'll edit this and your versioning question in my answer. Hadn't the time to study your post completely as you already accepted @QHarris ' valid approach.
  • JFerro
    JFerro over 5 years
    of course, all the time of the wrold,... Harris approach is "a" solution, but of course only if I could change the files themselves. I can not touch the files. I pasted my file in "XMLvalidator" and there was no errors, so, what do you mean exactly by: "<< CANNOT load this - NO valid XML string input" Carefully looking at your code it looks like changing in my code .loadXML by .load("... should work, but it doesnt. Since pasting your condition if not .load("... does not cause Fehler, but still the rest does not work
  • T.M.
    T.M. over 5 years
    BTW a) Do you want to get all <Query> items or just the English one, i.e. .Item(0)? In this case you needn't loop through the whole nodelist, you can pick the wanted item via index if you get a nodelist length greater than 0. b) Are there other <Queries> subnodes or does each XML file consist only of one such node and e.g. three language subnodes?
  • JFerro
    JFerro over 5 years
    all of them. ALL the XML are exactly the same, only changes the name of the file, and hence also <ConceptModel name="XXXXXX">, and the content of every of the Query. Which are always EN, DE, FR
  • T.M.
    T.M. over 5 years
    Tried to answer your questions. Of course you'll have to change to .Load(filepath) if you refer to a file path otherwise you don't get xml content - (cf. `Debug.Print oXML.xml). Furthermore your XPath settings searched for nodes that couldn't be identified, therefore you got no results :-)
  • barrowc
    barrowc over 5 years
    A further issue with using MSXML, v3.0 is that the default selection language is XSLPatterns instead of XPath. Details on some of the differences between MSXML versions are here and the differences between the two selection languages are discussed here
  • T.M.
    T.M. over 5 years
    Thank you for the additional links as well as stating the differences more precisely; I only commented in code that an explicit .setProperty "SelectionLanguage", "XPath" is necessary in Version 3 - may I include that in my answer? - @barrowc
  • barrowc
    barrowc over 5 years
    You are more than welcome to include any part of my comment in the answer