Ruby XPath to find Attribute

11,607

Solution 1

Your starting point would be REXML

The "challenge" here is how to treat an attribute node as a child node, and this can be done by using singleton methods, then everything else follows naturally:

require "rexml/document"
include REXML  # so that we don't have to prefix everything with REXML::...

def get_pair(xml_doc, key, value)
  XPath.each(xml_doc, key) do |node| 
    if node.is_a?(Attribute)
      def node.parent
        self.element
      end
    end
    puts "\"#{node}\" \"#{XPath.first(node, value)}\""
  end
end

xml_doc = Document.new <<EOF
  <root>
    <add key="A" value="B" />
    <add key="C" value="D" />
    <add foo="E" bar="F" />
  </root>
EOF

get_pair xml_doc, "//*/@key", "../@value"
get_pair xml_doc, "//*/@foo", "../@bar"

produces:

"A" "B"
"C" "D"
"E" "F"

Solution 2

Apparently Nokogiri is the fastest Ruby XML parser

See http://www.rubyinside.com/nokogiri-ruby-html-parser-and-xml-parser-1288.html

Was using it today and it's great.

For your example:

doc = Nokogiri::XML(your_xml)
doc.xpath("/root/add").map do |add|
  puts [add['key'], add['value']]
end

Edit: It unsurprisingly turns outthat the claim that Nokogiri is faster is not uncontroversial.

However, we have found it more stable than libxml in our production environmenty (libxml was occasionally crashing; just swapping in Nokogiri has solved the issue)

Solution 3

And if you will be parsing a decent amount of data in any area where performance matters, then you will need libxml-ruby. REXML and Hpricot are good, but I recently had to make the switch on my own server for some parsing stuff because it was about 1200% faster.

Share:
11,607
Hadeel Fouad
Author by

Hadeel Fouad

e-mail: alex2k8.stackoverflow [at] gmail [dot] com

Updated on June 04, 2022

Comments

  • Hadeel Fouad
    Hadeel Fouad about 2 years

    What Ruby library can be used to select attribute using XPath, and to use it as the starting point for other XPath queries.

    Example:

    <root>
      <add key="A" value="B" />
      <add key="C" value="D" />
      <add foo="E" bar="F" />
    </root>
    

    Desired code:

    get_pair "//*/@key", "../@value"
    get_pair "//*/@foo", "../@bar"
    

    Expected output:

    "A" "B"
    "C" "D"
    "E" "F"
    

    Pseudo implementation:

    def get_pair(key, value)
      xml_doc.select[key].each do |a|
        puts [a, a.select[value]]
      end
    end
    
  • oligan
    oligan over 15 years
    It's described as "slightly slower than libxml-ruby" in the tenderlovemaking.com/2008/10/30/nokogiri-is-released comments section.