Case insensitive XML parser in c#

22,617

Solution 1

An XMl document can have two different elements named respectively: MyName and myName -- that are intended to be different. Converting/treating them as the same name is an error that can have gross consequences.

In case the above is not the case, then here is a more precise solution, using XSLT to process the document into one that only has lowercase element names and lowercase attribute names:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:variable name="vUpper" select=
 "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>

 <xsl:variable name="vLower" select=
 "'abcdefghijklmnopqrstuvwxyz'"/>

 <xsl:template match="node()|@*">
     <xsl:copy>
       <xsl:apply-templates select="node()|@*"/>
     </xsl:copy>
 </xsl:template>

 <xsl:template match="*[name()=local-name()]" priority="2">
  <xsl:element name="{translate(name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="*" priority="1">
  <xsl:element name=
   "{substring-before(name(), ':')}:{translate(local-name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="@*[name()=local-name()]" priority="2">
  <xsl:attribute name="{translate(name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
       <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match="@*" priority="1">
  <xsl:attribute name=
   "{substring-before(name(), ':')}:{translate(local-name(), $vUpper, $vLower)}"
   namespace="{namespace-uri()}">
     <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on any XML document, for example this one:

<authors xmlns:user="myNamespace">
  <?ttt This is a PI ?>
  <Author xmlns:user2="myNamespace2">
    <Name idd="VH">Victor Hugo</Name>
    <user2:Name idd="VH">Victor Hugo</user2:Name>
    <Nationality xmlns:user3="myNamespace3">French</Nationality>
  </Author>
  <!-- This is a very long comment the purpose is
       to test the default stylesheet for long comments-->
  <Author Period="classical">
    <Name>Sophocles</Name>
    <Nationality>Greek</Nationality>
  </Author>
  <author>
    <Name>Leo Tolstoy</Name>
    <Nationality>Russian</Nationality>
  </author>
  <Author>
    <Name>Alexander Pushkin</Name>
    <Nationality>Russian</Nationality>
  </Author>
  <Author Period="classical">
    <Name>Plato</Name>
    <Nationality>Greek</Nationality>
  </Author>
</authors>

the wanted, correct result (element and attribute names converted to lowercase) is produced:

<authors><?ttt This is a PI ?>
   <author>
      <name idd="VH">Victor Hugo</name>
      <user2:name xmlns:user2="myNamespace2" idd="VH">Victor Hugo</user2:name>
      <nationality>French</nationality>
   </author><!-- This is a very long comment the purpose is
       to test the default stylesheet for long comments-->
   <author period="classical">
      <name>Sophocles</name>
      <nationality>Greek</nationality>
   </author>
   <author>
      <name>Leo Tolstoy</name>
      <nationality>Russian</nationality>
   </author>
   <author>
      <name>Alexander Pushkin</name>
      <nationality>Russian</nationality>
   </author>
   <author period="classical">
      <name>Plato</name>
      <nationality>Greek</nationality>
   </author>
</authors>

Once the document is converted to your desired form, then you can perform any desired processing on the converted document.

Solution 2

You can create case-insensitive methods (extensions for usability), e.g.:

public static class XDocumentExtensions
{
    public static IEnumerable<XElement> ElementsCaseInsensitive(this XContainer source,  
        XName name)
    {
        return source.Elements()
            .Where(e => e.Name.Namespace == name.Namespace 
                && e.Name.LocalName.Equals(name.LocalName, StringComparison.OrdinalIgnoreCase));
    }
}

Solution 3

XML is text. Just ToLower it before loading to whatever parser you are using.

So long as you don't have to validate against a schema and don't mind the values being all lower case, this should work just fine.


The fact is that any XML parser will be case sensitive. If it were not, it wouldn't be an XML parser.

Solution 4

I use another solution. The reason people want this is because you don't want to duplicate the name of the property in the class file in an attribute as well. So what I do is add a custom attribute to all properties:

[AttributeUsage(AttributeTargets.Property)]
public class UsePropertyNameToLowerAsXmlElementAttribute: XmlElementAttribute
{
    public UsePropertyNameToLowerAsXmlElementAttribute([CallerMemberName] string propertyName = null)
    : base(propertyName?.ToLower())
    {
    }
}

This way the XML serializer can map lower case properties to CamelCased classes.

The properties on the classes still have a decorator that says that something is different, but you don't have the overhead of marking every property with a name:

public class Settings
{
    [UsePropertyNameToLowerAsXmlElement]
    public string VersionId { get; set; }

    [UsePropertyNameToLowerAsXmlElement]
    public int? ApplicationId { get; set; }
}

Solution 5

I would start by converting all tags and attribute names to lowercase, leaving values untouched, by using SAX parsing, ie. with XmlTextReader.

Share:
22,617
Arsen Zahray
Author by

Arsen Zahray

Updated on April 02, 2020

Comments

  • Arsen Zahray
    Arsen Zahray about 4 years

    Everything you do with XML is case sensitive, I know that.

    However, right now I find myself in a situation, where the software I'm writing would yield much fewer errors if I somehow made xml name/attribute recognition case insensitive. Case insensitive XPath would be a god sent.

    Is there an easy way/library to do that in c#?

    • Henk Holterman
      Henk Holterman about 12 years
      Not likely. But you could do XElement.Parse(xmlText.Tolower())
    • Dimitre Novatchev
      Dimitre Novatchev about 12 years
      An XMl document can have two different elements named respectively: MyName and myName -- that are intended to be different. Converting/treating them as the same name is an error that can have gross consequences.
  • samiretas
    samiretas about 12 years
    But you probably don't want to ToLower your values
  • Oded
    Oded about 12 years
    @Chad - Probably. I did put that caveat in my answer.
  • Arsen Zahray
    Arsen Zahray about 12 years
    I have thought of that. In most cases this would work, except that sometimes fields might contain information, in which I want the case to be preserved. Like, for example, passwords, hashes and other stuff for external world. On the other hand, I do not really need to differentiate between Name and name attributes in xhtml
  • John Saunders
    John Saunders about 12 years
    Is there a Name attribute in XHTML, or is it name?
  • Arsen Zahray
    Arsen Zahray about 12 years
    sometimes the one, sometimes the other. that's the problem I'm having.
  • Dimitre Novatchev
    Dimitre Novatchev about 12 years
    Such total lowering "kills" certain elements. For example" aName, AName, and anamE all become aname. The second big problem with this idea is that it alters not only names, but also the content of text nodes and attributes. It also changes the values of namespaces, which makes the XML document totally unusable. A quick example: converting "xmlns:xsl="http://www.w3.org/1999/XSL/Transform" to "xmlns:xsl="http://www.w3.org/1999/xsl/transform" and the XML document (a syntactically valid xslt stylesheet) is now rejected by the XSLT processor.
  • David Alex
    David Alex over 4 years
    Is there any C++ code to convert XML attributes and nodes to uppercase letters or lowecase letters?
  • Dimitre Novatchev
    Dimitre Novatchev over 4 years
    @DavidAlex: You will need to call the functions of an XSLT processor that can be called in C++. You need to determine which one is best for you -- MSXML/MSXSL, Saxon/C, or another product. Then read the documentation of the chosen product and understand the code examples.
  • David Alex
    David Alex over 4 years
    @DimitreNovatchev, I want to use MSXML/MSXSL. do you have any sample codes for read xml and run the xsl conversion to convert attributes and nodes to uppercase letters or lowercase letters? I am quite new to XSL conversion and need help!. Thanks
  • Dimitre Novatchev
    Dimitre Novatchev over 4 years
    @DavidAlex -- you can use Microsoft's MSXML6 SDK -- it can be downloaded from here: microsoft.com/en-us/download/details.aspx?id=3988 . This should contain extensive documentation of the major types/classes and examples how to call their methods from different programming languages. There is an older one -- for MSXML4 and it can be downloaded here: microsoft.com/en-us/download/details.aspx?id=19662 Either of these should fit your needs and requirements
  • tig
    tig about 4 years
    @DimitreNovatchev, the xslt you provided causes all attributes on the root element to be lost. This is shown even in your example above. What would I change in the xlst to not have them be lost?
  • Dimitre Novatchev
    Dimitre Novatchev about 4 years
    @tig, No attribute is "lost". What you see missing are some namespaces that are not used in the descendents of the element. And there is nothing bad with this :) But if you must have these namespaces retained in the result of the transformation, you can add this: <xsl:copy-of select="namespace::*"/> before any <xsl:apply-templates select="node()|@*"/>