Case insensitive XPath contains() possible?

94,465

Solution 1

This is for XPath 1.0. If your environment supports XPath 2.0, see here.


Yes. Possible, but not beautiful.

/html/body//text()[
  contains(
    translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),
    'test'
  )
]

This would work for search strings where the alphabet is known beforehand. Add any accented characters you expect to see.


If you can, mark the text that interests you with some other means, like enclosing it in a <span> that has a certain class while building the HTML. Such things are much easier to locate with XPath than substrings in the element text.

If that's not an option, you can let JavaScript (or any other host language that you are using to execute XPath) help you with building an dynamic XPath expression:

function xpathPrepare(xpath, searchString) {
  return xpath.replace("$u", searchString.toUpperCase())
              .replace("$l", searchString.toLowerCase())
              .replace("$s", searchString.toLowerCase());
}

xp = xpathPrepare("//text()[contains(translate(., '$u', '$l'), '$s')]", "Test");
// -> "//text()[contains(translate(., 'TEST', 'test'), 'test')]"

(Hat tip to @KirillPolishchuk's answer - of course you only need to translate those characters you're actually searching for.)

This approach would work for any search string whatsoever, without requiring prior knowledge of the alphabet, which is a big plus.

Both of the methods above fail when search strings can contain single quotes, in which case things get more complicated.

Solution 2

Case-insensitive contains

/html/body//text()[contains(translate(., 'EST', 'est'), 'test')]

Solution 3

XPath 2.0 Solutions

  1. Use lower-case():

    /html/body//text()[contains(lower-case(.),'test')]

  2. Use matches() regex matching with its case-insensitive flag:

    /html/body//text()[matches(.,'test', 'i')]

Solution 4

Yes. You can use translate to convert the text you want to match to lower case as follows:

/html/body//text()[contains(translate(., 
                                      'ABCDEFGHIJKLMNOPQRSTUVWXYZ',
                                      'abcdefghijklmnopqrstuvwxyz'),
                   'test')]

Solution 5

The way i always did this was by using the "translate" function in XPath. I won't say its very pretty but it works correctly.

/html/body//text()[contains(translate(.,'abcdefghijklmnopqrstuvwxyz',
                                        'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),'TEST')]

hope this helps,

Share:
94,465

Related videos on Youtube

Aron Woost
Author by

Aron Woost

Updated on July 08, 2022

Comments

  • Aron Woost
    Aron Woost almost 2 years

    I'm running over all textnodes of my DOM and check if the nodeValue contains a certain string.

    /html/body//text()[contains(.,'test')]
    

    This is case sensitive. However, I also want to catch Test, TEST or TesT. Is that possible with XPath (in JavaScript)?

  • Tomalak
    Tomalak over 12 years
    +1 Absolutely. That's something I did not think of. (I'll use that in my answer, this is much better than the original JavaScript routine I wrote)
  • Aron Woost
    Aron Woost over 12 years
    Thanks! Also the addition is nice, translating only the needed chars. I'd be curious what the performance win is. Note that xpathPrepare() could handle more-than-once appearing chars differently (e.g. you get TEEEEEST and teeeeest).
  • Tomalak
    Tomalak over 12 years
    @AronWoost: Well, there might be some gain, just benchmark it if you are eager to find out. translate() itself does not care how often you repeat each character - translate(., 'EE', 'ee') is absolutely equivalent to translate(., 'E', 'e'). P.S.: Don't forget to up-vote @KirillPolishchuk, the idea was his.
  • Muhammad Adeel Zahid
    Muhammad Adeel Zahid about 11 years
    wouldn't it just convert TEST to test and leave Test as it is?
  • Daniel Haley
    Daniel Haley about 11 years
    @MuhammadAdeelZahid - No, it's replacing "T" with "t", "E" with "e", etc. It's a 1-to-1 match.
  • Stefan Steiger
    Stefan Steiger over 10 years
    System.Xml.XmlNodeList x = mydoc.SelectNodes("//*[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜÉÈÊÀÁÂÒÓÔÙÚÛÇÅÏÕÑŒ', 'abcdefghijklmnopqrstuvwxyzäöüéèêàáâòóôùúûçåïõñœ'),'foo')]")‌​;
  • Tomalak
    Tomalak over 10 years
    No. See the "of course you only need to translate those characters you're actually searching for" part.
  • mlissner
    mlissner almost 7 years
    It might be more clear to do translate(., 'TES', 'tes'). That way people will realize it's not a word translation, that it's a letter translation.
  • d-b
    d-b almost 5 years
    Is this syntax not supported in Firefox and Chrome? I just tried it in the console and they both return syntax error.
  • kjhughes
    kjhughes almost 5 years
    Firefox and Chrome only implement XPath 1.0.
  • George Birbilis
    George Birbilis over 3 years
    or 'EST, 'est', though it does look cool (albeit a bit cryptic) that part of the searched term is appearing in the mapping (the repeated letters removed)
  • Ankit Gupta
    Ankit Gupta over 3 years
    where I can verify that this will work as expected?
  • kjhughes
    kjhughes over 3 years
    @AnkitGupta: Any online or offline tool that supports XPath 2.0 can be used to verify this answer, of course, but (1) tool recommendations are off-topic here on SO and (2) given the 56 upvotes, 0 downvotes, and no dissenting comments in over six years, you can be pretty confident that this answer is correct. ;-)
  • Eaten by a Grue
    Eaten by a Grue over 2 years
    we need icontains() :-)
  • DANIEL ROSAS PEREZ
    DANIEL ROSAS PEREZ about 2 years
    Thank you so much!!!