XPath: select tag with empty value

14,812

Solution 1

How I can find in XPath 1.0 all rows with empty col name="POW"?

There are many possible definitions of "empty" and for each one of them there is a different XPath expression selecting "empty" elements.

A reasonable definition for an empty element is: an element that has no children elements and no text-node children, or an element that has a single text-node child, whose string value contains only whitespace characters.

This XPath expression:

//row[col[@name = 'POW']
                    [not(*)]
                       [not(normalize-space())]
      ]

selects all row elements in the XML document, that have a col child, that has an attribute name with string value "POW" and that has no children - elements and whose string value consists either entirely of whitespace characters, or is the empty string.

In case by "empty" you understand "having no children at all", which means no children elements and no children PI nodes and no children comment nodes, then use:

//row[col[@name = 'POW']
                    [not(node())]
      ]

Solution 2

//row[col[@name='POW' and not(normalize-space())]]

To ensure that the POW column also doesn't have any child elements(even if they don't contain any text), then add an additional predicate filter:

//row[col[@name='POW' and not(normalize-space()) and not(*)]]

Solution 3

Use this:

//row[col[@name = 'POW' and not(text())]]
Share:
14,812
pbm
Author by

pbm

Nothing interesting here... ;)

Updated on July 18, 2022

Comments

  • pbm
    pbm almost 2 years

    How I can find in XPath 1.0 all rows with empty col name="POW"?

    <row>
    <col name="WOJ">02</col>
    <col name="POW"/>
    <col name="GMI"/>
    <col name="RODZ"/>
    <col name="NAZWA">DOLNOŚLĄSKIE</col>
    <col name="NAZDOD">województwo</col>
    <col name="STAN_NA">2011-01-01</col>
    </row>
    

    I tried many solutions. Few times in Firefox extension XPath Checker selection was ok, but lxml.xpath() says that expression is invalid or just returns no rows.

    My Python code:

    from lxml import html
    f = open('TERC.xml', 'r')
    page = html.fromstring(f.read())
    for r in page.xpath("//row[col[@name = 'POW' and not(text())]]"):
        print r.text_content()
        print "-------------------------"
    
  • pbm
    pbm over 12 years
    There is unnecessary )] at end of expression... And it selects all rows in my code (in XPath Checker everything is ok). I updated my question...