xpath expression to remove whitespace
Solution 1
I. Use this single XPath expression:
translate(normalize-space(/tr/td/a), ' ', '')
Explanation:
normalize-space()
produces a new string from its argument, in which any leading or trailing white-space (space, tab, NL or CR characters) is deleted and any intermediary white-space is replaced by a single space character.translate()
takes the result produced bynormalize-space()
and produces a new string in which each of the remaining intermediary spaces is replaced by the empty string.
II. Alternatively:
translate(/tr/td/a, ' 	 
', '')
Solution 2
Please try the below xpath expression :
//td[@class='score-time status']/a[normalize-space() = '16 : 00']
Solution 3
You can use XPath's normalize-space() as in //a[normalize-space()="16 : 00"]
Solution 4
I came across this thread when I was having my own issue similar to above.
HTML
<div class="d-flex">
<h4 class="flex-auto min-width-0 pr-2 pb-1 commit-title">
<a href="/nsomar/OAStackView/releases/tag/1.0.1">
1.0.1
</a>
XPath start command
tree.xpath('//div[@class="d-flex"]/h4/a/text()')
However this grabbed random whitespace and gave me the output of:
['\n ', '\n 1.0.1\n ']
Using normalize-space, it removed the first blank space node and left me with just what I wanted
tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')
['\n 1.0.1\n ']
I could then grab the first element of the list, and use strip() to remove any further whitespace
XPath final command
tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')[0].strip()
Which left me with exactly what I required:
1.0.1
Solution 5
-
you can check if text() nodes are empty.
/path/text()[not(.='')]
it may be useful with axes like following-sibling:: if these are no containers, or with child::.
- you can use string() or the regex() function of xpath 2.
NOTE: some comments say that xpath cannot do string manipulation... even if it's not really designed for that you can do basic things: contains(), starts-with(), replace().
if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node.
- you can separate node and string manipulation
So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).
adellam
Updated on July 08, 2022Comments
-
adellam almost 2 years
I have this HTML:
<tr class="even expanded first> <td class="score-time status"> <a href="/matches/2012/08/02/europe/uefa-cup/"> 16 : 00 </a> </td> </tr>
I want to extract the (16 : 00) string without the extra whitespace. Is this possible?
-
Arup Rakshit almost 10 yearsIs there a shortest XPATH expression to get only the CDATA nodes though an XML file ?
-
Dimitre Novatchev almost 10 years@ArupRakshit, There are no "CDATA nodes" in the XPath Data Model and thus it is not possible to distinguish CDATA as part of the text node that contains it. The same way as it is not possible to know if the short tag was used for an element without children, or if quotes or apostrophes were used as delimiters around an attribute value.
-
Arup Rakshit almost 10 years@DimitreNovatchev Thanks for the reply. So it means, I need to find it , they way, I search for the regular nodes.
-
Dimitre Novatchev almost 10 years@ArupRakshit, Yes, one can only select whole text nodes in XPath. You could filter these nodes with predicate(s) if you know something more (like a substring) for the text you are looking for