c# whitespaces issue with XmlReader

14,838

Solution 1

It happens that reader.Read() read the white space character. Ignoring the spaces, the same same instruction read the second element ("gnam" a XML token), indeed bringing the pointer to the node2 element.

Debug the reader properties before and after the methods called in you example. Check for NodeType and Value properties. Give also a check for MoveToContent method also, it is very useful.

Read the documentation of all that methods and properties, and you will end up to learn how XmlReader class works, and how you use it for your purposes. Here is the first google result: it contains a very explicit example.

I ended up to the following (not complete) pattern:

private static void ReadXmlExt(XmlReader xmlReader, IXmlSerializableExt xmlSerializable, ReadElementDelegate readElementCallback)
{
    bool isEmpty;

    if (xmlReader == null)
        throw new ArgumentNullException("xmlReader");
    if (readElementCallback == null)
        throw new ArgumentNullException("readElementCallback");

    // Empty element?
    isEmpty = xmlReader.IsEmptyElement;
    // Decode attributes
    if ((xmlReader.HasAttributes == true) && (xmlSerializable != null))
        xmlSerializable.ReadAttributes(xmlReader);

    // Read the root start element
    xmlReader.ReadStartElement();

    // Decode elements
    if (isEmpty == false) {
        do {
            // Read document till next element
            xmlReader.MoveToContent();

            if (xmlReader.NodeType == XmlNodeType.Element) {
                string elementName = xmlReader.LocalName;

                // Empty element?
                isEmpty = xmlReader.IsEmptyElement;

                // Decode child element
                readElementCallback(xmlReader);
                xmlReader.MoveToContent();

                // Read the child end element (not empty)
                if (isEmpty == false) {
                    // Delegate check: it has to reach and end element
                    if (xmlReader.NodeType != XmlNodeType.EndElement)
                        throw new InvalidOperationException(String.Format("not reached the end element"));
                    // Delegate check: the end element shall correspond to the start element before delegate
                    if (xmlReader.LocalName != elementName)
                        throw new InvalidOperationException(String.Format("not reached the relative end element of {0}", elementName));

                    // Child end element
                    xmlReader.ReadEndElement();
                }
            } else if (xmlReader.NodeType == XmlNodeType.Text) {
                if (xmlSerializable != null) {
                    // Interface
                    xmlSerializable.ReadText(xmlReader);
                    Debug.Assert(xmlReader.NodeType != XmlNodeType.Text, "IXmlSerializableExt.ReadText shall read the text");
                } else
                    xmlReader.Skip();   // Skip text
            }
        } while (xmlReader.NodeType != XmlNodeType.EndElement);
    }
}

Solution 2

Per the documentation on IgnoreWhitespace, a new line is not considered insignificant.

White space that is not considered to be significant includes spaces, tabs, and blank lines used to set apart the markup for greater readability. An example of this is white space in element content.

XmlReaderSettings.IgnoreWhitespace

Solution 3

This is not nearly as robust as Luca's answer, but I've found following pattern useful with reasonable 'predictable' XML (variations in whitespace and values only). Consider:

string homedir = Path.GetDirectoryName(Application.ExecutablePath);
string xml = Path.Combine( homedir, "settings.xml" );

FileStream stream = new FileStream( xml, FileMode.Open );

XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreWhitespace = false;
XmlReader reader = XmlTextReader.Create( stream, readerSettings );

while( reader.Read() ){

    if ( reader.MoveToContent() == XmlNodeType.Element && reader.Name != "data" ){
        string name = reader.Name;
        string value = null;
        if (!reader.IsEmptyElement) 
        {
          reader.Read(); // advances reader to element content
          value = reader.ReadContentAsString(); // advances reader to endelement
        }
        reader.Read(); // advance reader to element content
        System.Diagnostics.Trace.WriteLine(
            reader.NodeType 
            + " "
            + name
            + " " 
            + value
        );
    }
}

stream.Close(); 

More generically, in lieu of reader.ReadElementContent*(), use reader.Read() followed by reader.ReadContent*().

Solution 4

If you want that the XmlReader does not read the whitespaces, you should initialize the XmlReader with the settings as follows:

XmlReaderSettings settings = new XmlReaderSettings();
settings.IgnoreWhitespace = true;
XmlReader xrd = XmlReader.Create(@"file.xml", settings);

it works for me in a xml file of the structure you posted:

<data>
    <node1>value1</node1>
    <node2>value2</node2>
</data>
Share:
14,838
scibuff
Author by

scibuff

Updated on June 15, 2022

Comments

  • scibuff
    scibuff almost 2 years

    I have a simple xml

    <data>
        <node1>value1</node1>
        <node2>value2</node2>
    </data>
    

    I'm using IXmlSerializable to read and write such xml with DTOs. The following code works just fine

    XmlReader reader;
    ...
    while( reader.Read() ){
        Console.Write( reader.ReadElementContentAsString() );
    }
    // outputs value1value2
    

    However, if whitespaces in the xml are removed, i.e.

    <data>
        <node1>value1</node1><node2>value2</node2>
    </data>
    

    or I use XmlReaderSettings.IgnoreWhitespace = true;, the code outputs only "value1" ignoring the second node. When I print the nodes that the parser traverses, I can see that ReadElementContentAsString moves the pointer to the EndElement of node2, but I don't understand why that should be happening or how to fix it.

    Is it a possible XML parser implementation bug?

    ===============================================

    Here's a sample code and 2 xml samples that produce different results

    string homedir = Path.GetDirectoryName(Application.ExecutablePath);
    string xml = Path.Combine( homedir, "settings.xml" );
    
    FileStream stream = new FileStream( xml, FileMode.Open );
    
    XmlReaderSettings readerSettings = new XmlReaderSettings();
    readerSettings.IgnoreWhitespace = false;
    XmlReader reader = XmlTextReader.Create( stream, readerSettings );
    
    while( reader.Read() ){
    
        if ( reader.MoveToContent() == XmlNodeType.Element && reader.Name != "data" ){
    
            System.Diagnostics.Trace.WriteLine(
                reader.NodeType 
                + " "
                + reader.Name
                + " " 
                + reader.ReadElementContentAsString()
            );
        }
    }
    
    stream.Close(); 
    

    1.) settings.xml

    <?xml version="1.0"?>
    <data>
        <node-1>value1</node-1>
        <node-2>value2</node-2>
    </data>
    

    2.) settings.xml

    <?xml version="1.0"?>
    <data>
        <node-1>value1</node-1><node-2>value2</node-2>
    </data>
    

    using (1) prints

    Element node-1 value1
    Element node-2 value2
    

    using (2) prints

    Element node-1 value1
    
    • Henk Holterman
      Henk Holterman about 12 years
      Can you post a small but complete sample that reproduces the problem? This seems to be non-working code, hard to tell what goes wrong.
    • scibuff
      scibuff about 12 years
      Ok, [this][1] explains the problem [1]: stackoverflow.com/questions/2299632/…
  • scibuff
    scibuff about 12 years
    makes sense, but even the first xml fails to parse properly if I set XmlReaderSettings.IgnoreWhitespace = true;