SAX vs XmlTextReader - SAX in C#

13,000

Solution 1

If you're talking about SAX for .NET, the project doesn't appear to be maintained. The last release was more than 2 years ago. Maybe they got it perfect on the last release, but I wouldn't bet on it. The author, Karl Waclawek, seems to have disappeared off the net.

As for SAX under Java? You bet, it's great. Unfortunately, SAX was never developed as a standard, so all of the non-Java ports have been adapting a Java API for their own needs. While DOM is a pretty lousy API, it has the advantage of having been designed for multiple languages and environments, so it's easy to implement in Java, C#, JavaScript, C, et al.

Solution 2

If you just want to get the job done quickly, the XmlTextReader exists for that purpose (in .NET).

If you want to learn a de facto standard (and available in may other programming languages) that is stable and which will force you to code very efficiently and elegantly, but which is also extremely flexible, then look into SAX. However, don't waste your time unless you're going to be creating highly esoteric XML parsers. Instead, look for parsers that next generation parsers (like XmlTextReader) for your particular platform.

SAX Resources
SAX was originally written for Java, and you can find the original open source project, which has been stable for several years, here: http://sax.sourceforge.net/

There is a C# port of the same project here (with HTML docs as part of the source download); it is also stable: http://saxdotnet.sourceforge.net/

If you do not like the C# implementation, you could always resort to referencing COM DLLs via COMInterop using MSXML3 or later: http://msdn.microsoft.com/en-us/library/ms994343.aspx

Articles that come from the Java world but which probably illustrate the concepts you need to be successful with this approach (there may also be downloadable Java source code that could prove useful and may be easy enough to convert to C#):

It will be a cumbersome implementation. I have only used SAX back in my pre-.NET days, but it requires some pretty advanced coding techniques. At this point, it's just not worth the trouble.

Interesting Concept for a Hybrid Parser
This thread describes a hybrid parser that uses the .NET XmlTextReader to implement a parser that provides a combination of DOM and SAX benefits...
http://bytes.com/groups/net-xml/178403-xmltextreader-versus-dom

Solution 3

I believe there are no benefits using SAX at least due two reasons:

  1. SAX is a "push" model while XmlReader is a pull parser that has a number of benefits.
  2. Being dependent on a 3rd-party library rather than using a standard .NET API.

Solution 4

Personally, I much prefer the SAX model as the XmlReader has some really annoying traps that can cause bugs in your code that might cause your code to skip elements. Most code would be structured around a while(rdr.Read()) model, but if you have any "ReadString" or "ReadInnerXml()" within that loop you will find yourself skipping elements on the next iteration.

As SAX is event based this will never hapen as you can not perform any operations that would cause your parser to seek-ahead.

My personal feeling is that Microsoft have invented the notion that the XmlReader is better with the explanation of the push/pull model, but I don't really buy it. So Microsoft think that you don't need to create a state-machine with XmlReader, that doesn't make sense to me, but anyway, it's just my opinion.

Share:
13,000
sbilstein
Author by

sbilstein

Blog Twitter Linkedin

Updated on June 06, 2022

Comments

  • sbilstein
    sbilstein about 2 years

    I am attempting to read a large XML document and I wanted to do it in chunks vs XmlDocument's way of reading the entire file into memory. I know I can use XmlTextReader to do this but I was wondering if anyone has used SAX for .NET? I know Java developers swear by it and I was wondering if it is worth giving it a try and if so what are the benefits in using it. I am looking for specifics.

  • EnocNRoll - AnandaGopal Pardue
    EnocNRoll - AnandaGopal Pardue over 15 years
    Hm, according to this page, SAX is a de facto standard in the industry (just not in the Microsoft world): xml.org/xml-dev
  • EnocNRoll - AnandaGopal Pardue
    EnocNRoll - AnandaGopal Pardue over 15 years
    Oh, it might be worth noting that the official SAX implementation from Java is table and has been unmodified for even longer than SAX for .NET. The only time that improvements will be needed to either codebase is basically if the XML standard evolves still more.
  • John Saunders
    John Saunders almost 15 years
    Your opinion seems to be based on the fact that you learned a few things about XmlReader the hard way. Is that the best way to form an opinion on technical matters?
  • Brett Ryan
    Brett Ryan almost 15 years
    John, I suppose you're right, and I apologise. Though I do find that the XmlReader to be a fault of a lot of strange bugs in software that could be avoided by a simple SAX based approach.
  • user430788
    user430788 about 12 years
    I agree with Brett. XmlTextReader is arcane and over-loaded with too many ways to do almost the same thing. Additionally, its model encourages a very loose definition of your accepted Xml structure. While this is handy for some applications, in most of mine I want to reject code that doesn't meet my intended structure. What I really want is a RDP xml library an Im rather surprised none has written one. Without that though, I much prefer SAX.
  • stephanmg
    stephanmg over 4 years
    So XmlReader is basically StAX-alike?