Apostrophe (') in XPath query

43,646

Solution 1

This is surprisingly difficult to do.

Take a look at the XPath Recommendation, and you'll see that it defines a literal as:

Literal ::=   '"' [^"]* '"' 
            | "'" [^']* "'"

Which is to say, string literals in XPath expressions can contain apostrophes or double quotes but not both.

You can't use escaping to get around this. A literal like this:

'Some'Value'

will match this XML text:

Some'Value

This does mean that it's possible for there to be a piece of XML text that you can't generate an XPath literal to match, e.g.:

<elm att="&quot;&apos"/>

But that doesn't mean it's impossible to match that text with XPath, it's just tricky. In any case where the value you're trying to match contains both single and double quotes, you can construct an expression that uses concat to produce the text that it's going to match:

elm[@att=concat('"', "'")]

So that leads us to this, which is a lot more complicated than I'd like it to be:

/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
/// 
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value.  If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
static string XPathLiteral(string value)
{
    // if the value contains only single or double quotes, construct
    // an XPath literal
    if (!value.Contains("\""))
    {
        return "\"" + value + "\"";
    }
    if (!value.Contains("'"))
    {
        return "'" + value + "'";
    }

    // if the value contains both single and double quotes, construct an
    // expression that concatenates all non-double-quote substrings with
    // the quotes, e.g.:
    //
    //    concat("foo", '"', "bar")
    StringBuilder sb = new StringBuilder();
    sb.Append("concat(");
    string[] substrings = value.Split('\"');
    for (int i = 0; i < substrings.Length; i++ )
    {
        bool needComma = (i>0);
        if (substrings[i] != "")
        {
            if (i > 0)
            {
                sb.Append(", ");
            }
            sb.Append("\"");
            sb.Append(substrings[i]);
            sb.Append("\"");
            needComma = true;
        }
        if (i < substrings.Length - 1)
        {
            if (needComma)
            {
                sb.Append(", ");                    
            }
            sb.Append("'\"'");
        }

    }
    sb.Append(")");
    return sb.ToString();
}

And yes, I tested it with all the edge cases. That's why the logic is so stupidly complex:

    foreach (string s in new[]
    {
        "foo",              // no quotes
        "\"foo",            // double quotes only
        "'foo",             // single quotes only
        "'foo\"bar",        // both; double quotes in mid-string
        "'foo\"bar\"baz",   // multiple double quotes in mid-string
        "'foo\"",           // string ends with double quotes
        "'foo\"\"",         // string ends with run of double quotes
        "\"'foo",           // string begins with double quotes
        "\"\"'foo",         // string begins with run of double quotes
        "'foo\"\"bar"       // run of double quotes in mid-string
    })
    {
        Console.Write(s);
        Console.Write(" = ");
        Console.WriteLine(XPathLiteral(s));
        XmlElement elm = d.CreateElement("test");
        d.DocumentElement.AppendChild(elm);
        elm.SetAttribute("value", s);

        string xpath = "/root/test[@value = " + XPathLiteral(s) + "]";
        if (d.SelectSingleNode(xpath) == elm)
        {
            Console.WriteLine("OK");
        }
        else
        {
            Console.WriteLine("Should have found a match for {0}, and didn't.", s);
        }
    }
    Console.ReadKey();
}

Solution 2

I ported Robert's answer to Java (tested in 1.6):

/// <summary>
/// Produce an XPath literal equal to the value if possible; if not, produce
/// an XPath expression that will match the value.
///
/// Note that this function will produce very long XPath expressions if a value
/// contains a long run of double quotes.
/// </summary>
/// <param name="value">The value to match.</param>
/// <returns>If the value contains only single or double quotes, an XPath
/// literal equal to the value.  If it contains both, an XPath expression,
/// using concat(), that evaluates to the value.</returns>
public static String XPathLiteral(String value) {
    if(!value.contains("\"") && !value.contains("'")) {
        return "'" + value + "'";
    }
    // if the value contains only single or double quotes, construct
    // an XPath literal
    if (!value.contains("\"")) {
        System.out.println("Doesn't contain Quotes");
        String s = "\"" + value + "\"";
        System.out.println(s);
        return s;
    }
    if (!value.contains("'")) {
        System.out.println("Doesn't contain apostophes");
        String s =  "'" + value + "'";
        System.out.println(s);
        return s;
    }

    // if the value contains both single and double quotes, construct an
    // expression that concatenates all non-double-quote substrings with
    // the quotes, e.g.:
    //
    //    concat("foo", '"', "bar")
    StringBuilder sb = new StringBuilder();
    sb.append("concat(");
    String[] substrings = value.split("\"");
    for (int i = 0; i < substrings.length; i++) {
        boolean needComma = (i > 0);
        if (!substrings[i].equals("")) {
            if (i > 0) {
                sb.append(", ");
            }
            sb.append("\"");
            sb.append(substrings[i]);
            sb.append("\"");
            needComma = true;
        }
        if (i < substrings.length - 1) {
            if (needComma) {
                sb.append(", ");
            }
            sb.append("'\"'");
        }
        System.out.println("Step " + i + ": " + sb.toString());
    }
    //This stuff is because Java is being stupid about splitting strings
    if(value.endsWith("\"")) {
        sb.append(", '\"'");
    }
    //The code works if the string ends in a apos
    /*else if(value.endsWith("'")) {
        sb.append(", \"'\"");
    }*/
    sb.append(")");
    String s = sb.toString();
    System.out.println(s);
    return s;
}

Hope this helps somebody!

Solution 3

EDIT: After a heavy unit testing session, and checking the XPath Standards, I have revised my function as follows:

public static string ToXPath(string value) {

    const string apostrophe = "'";
    const string quote = "\"";

    if(value.Contains(quote)) {
        if(value.Contains(apostrophe)) {
            throw new XPathException("Illegal XPath string literal.");
        } else {
            return apostrophe + value + apostrophe;
        }
    } else {
        return quote + value + quote;
    }
}

It appears that XPath doesn't have a character escaping system at all, it's quite primitive really. Evidently my original code only worked by coincidence. My apologies for misleading anyone!

Original answer below for reference only - please ignore

For safety, make sure that any occurrence of all 5 predefined XML entities in your XPath string are escaped, e.g.

public static string ToXPath(string value) {
    return "'" + XmlEncode(value) + "'";
}

public static string XmlEncode(string value) {
    StringBuilder text = new StringBuilder(value);
    text.Replace("&", "&amp;");
    text.Replace("'", "&apos;");
    text.Replace(@"""", "&quot;");
    text.Replace("<", "&lt;");
    text.Replace(">", "&gt;");
    return text.ToString();
}

I have done this before and it works fine. If it doesn't work for you, maybe there is some additional context to the problem that you need to make us aware of.

Solution 4

By far the best approach to this problem is to use the facilities provided by your XPath library to declare an XPath-level variable that you can reference in the expression. The variable value can then be any string in the host programming language, and isn't subject to the restrictions of XPath string literals. For example, in Java with javax.xml.xpath:

XPathFactory xpf = XPathFactory.newInstance();
final Map<String, Object> variables = new HashMap<>();
xpf.setXPathVariableResolver(new XPathVariableResolver() {
  public Object resolveVariable(QName name) {
    return variables.get(name.getLocalPart());
  }
});

XPath xpath = xpf.newXPath();
XPathExpression expr = xpath.compile("ListObject[@Title=$val]");
variables.put("val", someValue);
NodeList nodes = (NodeList)expr.evaluate(someNode, XPathConstants.NODESET);

For C# XPathNavigator you would define a custom XsltContext as described in this MSDN article (you'd only need the variable-related parts of this example, not the extension functions).

Solution 5

Most of the answers here focus on how to use string manipulation to cobble together an XPath that uses string delimiters in a valid way.

I would say the best practice is not to rely on such complicated and potentially fragile methods.

The following applies to .NET since this question is tagged with C#. Ian Roberts has provided what I think is the best solution for when you're using XPath in Java.

Nowadays, you can use Linq-to-Xml to query XML documents in a way that allows you to use your variables in the query directly. This is not XPath, but the purpose is the same.

For the example given in OP, you could query the nodes you want like this:

var value = "Some value with 'apostrophes' and \"quotes\"";

// doc is an instance of XElement or XDocument
IEnumerable<XElement> nodes = 
                      doc.Descendants("ListObject")
                         .Where(lo => (string)lo.Attribute("Title") == value);

or to use the query comprehension syntax:

IEnumerable<XElement> nodes = from lo in doc.Descendants("ListObject")
                              where (string)lo.Attribute("Title") == value
                              select lo;

.NET also provides a way to use XPath variables in your XPath queries. Sadly, it's not easy to do this out of the box, but with a simple helper class that I provide in this other SO answer, it's quite easy.

You can use it like this:

var value = "Some value with 'apostrophes' and \"quotes\"";

var variableContext = new VariableContext { { "matchValue", value } };
// ixn is an instance of IXPathNavigable
XPathNodeIterator nodes = ixn.CreateNavigator()
                             .SelectNodes("ListObject[@Title = $matchValue]", 
                                          variableContext);
Share:
43,646
Prabhu
Author by

Prabhu

Updated on July 09, 2022

Comments

  • Prabhu
    Prabhu almost 2 years

    I use the following XPATH Query to list the object under a site. ListObject[@Title='SomeValue']. SomeValue is dynamic. This query works as long as SomeValue does not have an apostrophe ('). Tried using escape sequence also. Didn't work.

    What am I doing wrong?

  • Welbog
    Welbog almost 15 years
    You shouldn't even have to treat XML as a plain string. Things like escaping and unescaping are abstracted away for you by the built-in XML libraries. You're reinventing the wheel here.
  • Welbog
    Welbog almost 15 years
    That's not how you escape characters in XML.
  • Christian Hayter
    Christian Hayter almost 15 years
    If you could point me to a BCL class that abstracts away the process of building an XPath query string, I would gladly ditch these functions.
  • Robert Rossney
    Robert Rossney almost 15 years
    That's true. But an XPath query isn't XML text, and at any rate he's not escaping the quotation marks for XPath anyway, he's escaping them for C#. The actual, literal XPath is ListObject[@Title="SomeValue"]
  • Gyuri
    Gyuri over 14 years
    BTW, this solution might solve your problem, too, that pretty much states the same thing as you do: stackoverflow.com/questions/642125/…
  • kan
    kan over 13 years
    How is about "\n"? I have doubt new lines could cause problems too.
  • Daniel A. White
    Daniel A. White about 11 years
    fyi yours does not pass all of his tests.
  • Daniel A. White
    Daniel A. White about 11 years
    change your add to the parts to quote the string.
  • Jonathan Gilbert
    Jonathan Gilbert about 11 years
    Thanks, not sure how I missed that. Fixed. :-)
  • Elmue
    Elmue over 10 years
    Hello Your code is 1000 times better than the original but still more clumsy than required. Instead of first adding a string which you later remove it would be easier: String[] split = value.Split('"'); for (int i=0; i<split.length; i++) { if (i>0) parts.Add("'\"'"); if (split[i].Length > 0) parts.Add('"' + split[i] + '"'); }
  • Elmue
    Elmue over 10 years
    You did not understand the question.
  • Elmue
    Elmue over 10 years
    You did not understand the quesion. The XPath syntax does NOT allow the backslash character for escaping.
  • Jonathan Gilbert
    Jonathan Gilbert over 10 years
    @Elmue That's a matter of personal taste, I suppose. I find that clunkier than removing the final string. There's no significant difference in performance, of course. Another way to implement it might be to add a '"' before each entry, and then use an inline LINQ expression instead of having a separate .Remove statement: foreach (var str in value.Split('"')) { parts.Add("'\"'"); if (!string.IsNullOrEmpty(str)) parts.Add('"' + str + '"'); } return "concat(" + string.Join(",", parts.Skip(1)) + ")";
  • Jonathan Gilbert
    Jonathan Gilbert over 10 years
    @Elmue Thanks for the compliment, by the way :-)
  • Ian Roberts
    Ian Roberts about 10 years
    @kan no, it's perfectly fine for a string literal in XPath to contain a newline character. The only restriction is that single quoted literals can't contain single quotes and double quoted literals can't contain double quotes.
  • Flynn1179
    Flynn1179 over 9 years
    Such as System.Security.SecurityElement.Escape(value)? (in C#)
  • JLRishe
    JLRishe over 9 years
    @ChristianHayter I'm very late to the party here, but the point you've missed (and that I think Welbog was trying to make) is that XPath has the concept of variables, which are immune to these string delimeter problems. So the best practice is to make use of them. .NET does provide a mechanism for using variables in XPath and I've provided an example of how to do so here.
  • JLRishe
    JLRishe over 9 years
    This is by far the best approach. +1
  • JLRishe
    JLRishe over 9 years
    "This is surprisingly difficult to do." It's only surprisingly difficult to do if you go about it the wrong way (by trying to cobble strings together). If you use one of the right approaches, it's pretty simple.
  • Christian Hayter
    Christian Hayter over 9 years
    @JLRishe: I hadn't looked at this question for years; I have not written any XPath queries at all since LINQ to XML came out. :-) Parameterising the data values is always the best solution to any string injection problem, so I have upvoted both your answers. Thanks very much.
  • Dadapeer Pvg
    Dadapeer Pvg almost 9 years
    Hi @RobertRossney I'm facing the same problem, i have my search string as "???" and "+". I'm able to find all except these two. Can you please suggest something for the same.
  • hanshenrik
    hanshenrik almost 7 years
    awesome work @RobertRossney ! btw, i did a PHP port of this code, if anyone is interested: gist.github.com/divinity76/64b0c12bcafc2150efa8ca87d2ccee52
  • hanshenrik
    hanshenrik over 6 years
    the exception is not necessary, you can work around it by using concat(), see Robert Rossney's answer
  • Jtbs
    Jtbs about 2 years
    I think you have an excellent point, and this is an excellent alternative.