retrieve xpath content from div id

24,003

Solution 1

From what I see your data are in a CDATA tag. This prevents parsing its content.

See How do I retrieve element text inside CDATA markup via XPath? for more details.

Solution 2

//description/div[@id="article-field1"]/a/text() 

If the malformed CDATA tag is removed, a root element is added and the corresponding 'description' tag is closed. This assumes an error of partially pasting the original XML, which is all that makes sense given the expression. Basically, the original query was missing the a element.

This can be verified at http://www.xpathtester.com/.

Solution 3

You can't do it with a single call of plain-vanilla XPATH processor.

You have two choices:

  1. Uses a specific XPATH processor that implements the dyn:evaluate() function (and this begs the question: What processor and version are you using?); OR
  2. Use two calls. The first go get the text value of the /title/item/description node. The second, after loading the result of the first as a new XML document (with a few tweeks to convert the xml fragment into a proper xml document), is div[@id="article-field1"] .
Share:
24,003
shadow
Author by

shadow

Updated on August 06, 2022

Comments

  • shadow
    shadow almost 2 years

    How do I retrieved the text inside article-field1?

    <title>Testing</title>
      <link>http://example.org</link>
      <description>Description</description>
      <language>en-us</language>
      <lastBuildDate>Mon, 13 Feb 2012 00:00:00 +0000</lastBuildDate>
    
      <item>
        <title>Title Here</title>
        <link>http://example.org/2012/03/27/</link>
        <description><![CDATA[
            <div id="article-field1"><a href="http://example.org/test1">Test 1</a></div>
            <div id="article-field2">123</div>
        <pubDate>Tue, 2 Mar 2012 00:00:00 +0000</pubDate>
      </item>
    

    I've tried to use

    //description/div[@id="article-field1"]/text()
    

    Any advise?

    Thanks

  • Sean B. Durkin
    Sean B. Durkin over 12 years
    Note: The content of the title/item/description node is pure character data, not XML, that is why "//description/div[@id="article-field1"]/text()" does not work.