What is the best open XML parser for C++?

333,416

Solution 1

How about RapidXML? RapidXML is a very fast and small XML DOM parser written in C++. It is aimed primarily at embedded environments, computer games, or any other applications where available memory or CPU processing power comes at a premium. RapidXML is licensed under Boost Software License and its source code is freely available.

Features

  • Parsing speed (including DOM tree building) approaching speed of strlen function executed on the same data.
  • On a modern CPU (as of 2008) the parser throughput is about 1 billion characters per second. See Performance section in the Online Manual.
  • Small memory footprint of the code and created DOM trees.
  • A headers-only implementation, simplifying the integration process.
  • Simple license that allows use for almost any purpose, both commercial and non-commercial, without any obligations.
  • Supports UTF-8 and partially UTF-16, UTF-32 encodings.
  • Portable source code with no dependencies other than a very small subset of C++ Standard Library.
  • This subset is so small that it can be easily emulated manually if use of standard library is undesired.

Limitations

  • The parser ignores DOCTYPE declarations.
  • There is no support for XML namespaces.
  • The parser does not check for character validity.
  • The interface of the parser does not conform to DOM specification.
  • The parser does not check for attribute uniqueness.

Source: wikipedia.org://Rapidxml


Depending on you use, you may use an XML Data Binding? CodeSynthesis XSD is an XML Data Binding compiler for C++ developed by Code Synthesis and dual-licensed under the GNU GPL and a proprietary license. Given an XML instance specification (XML Schema), it generates C++ classes that represent the given vocabulary as well as parsing and serialization code.

One of the unique features of CodeSynthesis XSD is its support for two different XML Schema to C++ mappings: in-memory C++/Tree and stream-oriented C++/Parser. The C++/Tree mapping is a traditional mapping with a tree-like, in-memory data structure. C++/Parser is a new, SAX-like mapping which represents the information stored in XML instance documents as a hierarchy of vocabulary-specific parsing events. In comparison to C++/Tree, the C++/Parser mapping allows one to handle large XML documents that would not fit in memory, perform stream-oriented processing, or use an existing in-memory representation.

Source: wikipedia.org://CodeSynthesis XSD

Solution 2

pugixml - Light-weight, simple and fast XML parser for C++ Very small (comparable to RapidXML), very fast (comparable to RapidXML), very easy to use (better than RapidXML).

Solution 3

Try TinyXML.

http://sourceforge.net/projects/tinyxml

Solution 4

TiCPP is a "more c++" version of TinyXML.

'TiCPP' is short for the official name TinyXML++. It is a completely new interface to TinyXML (http://www.grinninglizard.com/tinyxml/) that uses MANY of the C++ strengths. Templates, exceptions, and much better error handling. It is also fully documented in doxygen. It is really cool because this version let's you interface tiny the exact same way as before or you can choose to use the new 'ticpp' classes. All you need to do is define TIXML_USE_TICPP. It has been tested in VC 6.0, VC 7.0, VC 7.1, VC 8.0, MinGW gcc 3.4.5, and in Linux GNU gcc 3+

Solution 5

try this one: http://www.applied-mathematics.net/tools/xmlParser.html
it's easier and faster than RapidXML or PUGXML.
TinyXML is the worst of the "simple parser".

Share:
333,416

Related videos on Youtube

whaledawg
Author by

whaledawg

Updated on July 25, 2020

Comments

  • whaledawg
    whaledawg almost 4 years

    I am looking for a simple, clean, correct XML parser to use in my C++ project. Should I write my own?

    • Nicol Bolas
      Nicol Bolas almost 12 years
      Note: there is a question about how to pick an XML parser for C++.
    • Dan Nissenbaum
      Dan Nissenbaum about 10 years
      As @NicolBolas points out, there is now a much more recent StackOverflow posting that asks the same question: stackoverflow.com/questions/9387610/…
    • Dan Nissenbaum
      Dan Nissenbaum over 9 years
      Note that the much newer StackOverflow posting I reference above has nearly as many upvotes as the current question (as of Dec 2014), and the answer has many more upvotes than the answers here and has a fantastic, easy-to-read flow chart.
  • whaledawg
    whaledawg over 15 years
    OK, more to the point which of those features doesn't TinyXML have?
  • Lev
    Lev over 15 years
    It implements the whole DOM. TinyXML is simpler, but enough for keeping data in XML.
  • JohnIdol
    JohnIdol over 15 years
    Used tinyXML several times on VC++ and eVC++ - always worked fine
  • Frank
    Frank over 15 years
    I like the headers-only approach (I think you really need one header file). Just throw it in and don't worry about changing anything in your build process.
  • StaxMan
    StaxMan about 15 years
    Hmmh. if "The parser does not check for character validity" and "The parser does not check for attribute uniqueness", it is, strictly speaking, NOT an xml parser -- these are not optional checks, mandated by xml spec itself. I would not waste my time on such a thing as there are actual good decent parsers too (libxml2 for example)_
  • Martin Beckett
    Martin Beckett over 14 years
    It's the reason I use Rapidxml. One system I work with insists on putting illegal trailing spaces on the element names - rapidXML is the only one that can cope with this (admittedly by not noticing!)
  • bobobobo
    bobobobo over 14 years
    this is just beautiful.. compared with xerces..
  • deft_code
    deft_code over 14 years
    Xerces implments the ENTIRe xml standard. TinyXML implments just enough to be useful. It turns out that 99% or users will only ever use 1% of the XML standard, so TinyXML is usually more that sufficient.
  • Rajakumar
    Rajakumar about 14 years
    rapidxml having many functionality to implement a xml ,like msxml .But node traversing is very difficult than other parser...and also file read and write ...
  • Petrus Theron
    Petrus Theron almost 14 years
    +1 for RapidXML. Great for XML message building in my application.
  • Nav
    Nav almost 13 years
    When choosing an XML parser for commercial use (in a certain kind of domain), we need to see if the parser will be maintained for at least 2 or 3 decades. Something like Xerces seems more likely to remain supported and maintained, than RapidXML. So would RapidXML be a wise choice to use?
  • Kissaki
    Kissaki almost 13 years
    Wow, that’s a lot of claims. Can you back those up? What makes it better in those areas? Any reference articles?
  • Kissaki
    Kissaki almost 13 years
    Reading a bit on the RapidXML as well as pugixml websites I understand what you (probably) mean. RapidXML is based on / inspired by pugixml. It has minimal documentation on parsing. pugixml has good documentation on parsing and nice API. (Only read about parsing so far.)
  • aurel
    aurel about 12 years
    Pugixml is a lot easier to use, let's take reading xml from file - it's just load_file("file.xml")! I find it a lot more intuitive than rapid_xml. Selecting nodes by xpath also works pretty nice.
  • KindDragon
    KindDragon over 11 years
  • Olical
    Olical over 11 years
    Boost.PropertyTree was perfect for my kind of simple data storage. This is the page that made it clear how to use it. Wow, I love boost.
  • dlchambers
    dlchambers over 11 years
    I've been using pugixml for a few years. Works well, easy to integrate into projects, decent docs. BUT, no matter what package you use, XML composing/parsing in C++ is always a messy affair.
  • Nayana Adassuriya
    Nayana Adassuriya about 11 years
    For commercial use you have to pay one time fee for gSoap
  • Shep
    Shep over 10 years
    the beginning of this post is a direct copy from wikipeida
  • arayq2
    arayq2 over 10 years
    pugixml is an excellent package. The document composition API is a bit clunky (though in fairness this criticism applies to just about every package I've seen!), but the Xpath support is a huge plus.
  • eonil
    eonil about 10 years
    RapidXML manual explicitly clarifies that they don't perform string decoding. It seems to be something about XML special character escape, but actually I really don't get what is means.
  • user14471901
    user14471901 about 10 years
    Boost PropertyTree is not that useful except in trivial XML files. The structure doesn't have backward linking so getting to parents of nodes means you really need to roll your own data structure to store the XML after Property Tree reads it. And it has no query support of the xpath nature. All you can do easily is read in an XML file into a tree structure and directly pull out a value if you know the exact path.
  • afterxleep
    afterxleep almost 10 years
    I like the boost::property_tree too. There are some practical Visual Studio implementations of how to parse XML and JSON
  • backend_dev_123
    backend_dev_123 over 9 years
    I am trying this out, and for some reason the classes I call from tinyxml2 get a not resolved error. Any idea why? I found the classes in the header file which I included, so they should be available.
  • Andreas Haferburg
    Andreas Haferburg over 9 years
    boost::property_tree is very bloated (increases compile time and executable size) and doesn't seem to be maintained anymore. Not recommended.
  • Andrew
    Andrew almost 9 years
  • Andrew
    Andrew almost 9 years
    Just a warning though, to those who are checking it out as I am: the newer version has a really odd license and you can't even download it without first sending him an email. I think I'll go with pugixml.
  • Moshe Rubin
    Moshe Rubin over 8 years
    I'm looking for a light XML DOM parser I could compile for multiple platforms. In the past I used expat (SAX) for Windows, staying away from the bloated Apache Xerces DOM parser, but I followed @Zbyl's recommendation, and I'm happy I did. Consisting of only one CPP and two H files, with a CMakeList.txt file all ready, has made my life as simple as possible. pugixml's simple API is exactly what I need. Zbyl gets my upvote!
  • sg7
    sg7 about 8 years
    @Kissaki I have tested a few XML parsers including a few commercial ones before using [pugixml] (pugixml.org) in a commercial product.
  • TarmoPikaro
    TarmoPikaro almost 8 years
    I have rejected this library from use because - it did not provide API function for loading .xml from file. Also unicode support is questionable - end-user needs to explicitly define which encoding you use - meanwhile it should be probed from xml. It's fast probably as metrics shows, but currently I prefer usability and complete implementation over performance.
  • TarmoPikaro
    TarmoPikaro almost 8 years
    I have rejected this library (Also checked TinyXML2) from use because - library did not provide loading from unicode path names. Also currently I prefer usability and complete implementation over performance.
  • TarmoPikaro
    TarmoPikaro almost 8 years
    I have rejected this library (Also checked TinyXML2) from use because - library did not provide loading from unicode path names. Also currently I prefer usability and complete implementation over performance.
  • TarmoPikaro
    TarmoPikaro almost 8 years
    Simple to integrate (2 headers + source). Unicode is supported. Small fingerprint. I've took this library over rapidxml and tinyxml.
  • trampster
    trampster over 5 years
    Hasn't had a release since 2009, doesn't compile with GCC if your use the printing... does not inspire confidence at all.
  • Dan
    Dan about 4 years
    You happen to be using CodeBlocks? Im trying to get the c++ wrapper for this up and running and it's giving me fits.