Nokogiri/Xpath namespace query

19,692

All namespaces need to be registered when parsing. Nokogiri automatically registers namespaces on the root node. Any namespaces that are not on the root node you have to register yourself. This should work:

puts doc.xpath('//dc:title', 'dc' => "URI")

Alternately, you can remove namespaces altogether. Only do this if you are certain there will be no conflicting node names.

doc.remove_namespaces!
puts doc.xpath('//title')
Share:
19,692
Jamie
Author by

Jamie

Skills in awk, ruby, tcl, c++, java, javascript and hopefully soon RoR.

Updated on June 01, 2022

Comments

  • Jamie
    Jamie about 2 years

    I'm trying to pull out the dc:title element using an xpath. I can pull out the metadata using the following code.

    doc = <<END
    <?xml version="1.0" encoding="UTF-8"?>
    <package xmlns="http://www.idpf.org/2007/opf" version="2.0">
      <metadata xmlns:dc="URI">
        <dc:title>title text</dc:title>
      </metadata>
    </package>
    END
    
    doc = Nokogiri::XML(doc)
    
    # Awesome this works!
    puts '//xmlns:metadata'
    puts doc.xpath('//xmlns:metadata')
    # => <metadata xmlns:dc="URI"><dc:title>title text</dc:title></metadata>
    

    As you can see the above appears to work correctly. However I don't seem to be able to get the title information from this node tree, all of the below fail.

    puts doc.xpath('//xmlns:metadata/title')
    # => nil
    
    puts doc.xpath('//xmlns:metadata/dc:title')
    # => ERROR: `evaluate': Undefined namespace prefix
    
    puts doc.xpath('//xmlns:dc:title')
    # => ERROR: 'evaluate': Invalid expression: //xmlns:dc:title
    

    Could someone please explain how namespaces should be used in an xpath with the above xml doc.

  • Simon Lepkin
    Simon Lepkin almost 9 years
    Using remove_namespace! is the most sensible thing to try first. But beware: if you're modifying this XML and submitting it to an external API, the API will (often) reject it without the namespaces.