Convert a Nokogiri document to a Ruby Hash

47,300

Solution 1

I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML (http://github.com/Empact/roxml/tree) which maps xml elements to ruby objects; it is built atop libxml.

# USAGE: Hash.from_libxml(YOUR_XML_STRING)
require 'xml/libxml'
# adapted from 
# http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0

class Hash 
  class << self
        def from_libxml(xml, strict=true) 
          begin
            XML.default_load_external_dtd = false
            XML.default_pedantic_parser = strict
            result = XML::Parser.string(xml).parse 
            return { result.root.name.to_s => xml_node_to_hash(result.root)} 
          rescue Exception => e
            # raise your custom exception here
          end
        end 

        def xml_node_to_hash(node) 
          # If we are at the root of the document, start the hash 
          if node.element? 
           if node.children? 
              result_hash = {} 

              node.each_child do |child| 
                result = xml_node_to_hash(child) 

                if child.name == "text"
                  if !child.next? and !child.prev?
                    return result
                  end
                elsif result_hash[child.name.to_sym]
                    if result_hash[child.name.to_sym].is_a?(Object::Array)
                      result_hash[child.name.to_sym] << result
                    else
                      result_hash[child.name.to_sym] = [result_hash[child.name.to_sym]] << result
                    end
                  else 
                    result_hash[child.name.to_sym] = result
                  end
                end

              return result_hash 
            else 
              return nil 
           end 
           else 
            return node.content.to_s 
          end 
        end          
    end
end

Solution 2

If you want to convert a Nokogiri XML document to a hash, just do the following:

require 'active_support/core_ext/hash/conversions'
hash = Hash.from_xml(nokogiri_document.to_s)

Solution 3

Here's a far simpler version that creates a robust Hash that includes namespace information, both for elements and attributes:

require 'nokogiri'
class Nokogiri::XML::Node
  TYPENAMES = {1=>'element',2=>'attribute',3=>'text',4=>'cdata',8=>'comment'}
  def to_hash
    {kind:TYPENAMES[node_type],name:name}.tap do |h|
      h.merge! nshref:namespace.href, nsprefix:namespace.prefix if namespace
      h.merge! text:text
      h.merge! attr:attribute_nodes.map(&:to_hash) if element?
      h.merge! kids:children.map(&:to_hash) if element?
    end
  end
end
class Nokogiri::XML::Document
  def to_hash; root.to_hash; end
end

Seen in action:

xml = '<r a="b" xmlns:z="foo"><z:a>Hello <b z:m="n" x="y">World</b>!</z:a></r>'
doc = Nokogiri::XML(xml)
p doc.to_hash
#=> {
#=>   :kind=>"element",
#=>   :name=>"r",
#=>   :text=>"Hello World!",
#=>   :attr=>[
#=>     {
#=>       :kind=>"attribute",
#=>       :name=>"a", 
#=>       :text=>"b"
#=>     }
#=>   ], 
#=>   :kids=>[
#=>     {
#=>       :kind=>"element", 
#=>       :name=>"a", 
#=>       :nshref=>"foo", 
#=>       :nsprefix=>"z", 
#=>       :text=>"Hello World!", 
#=>       :attr=>[], 
#=>       :kids=>[
#=>         {
#=>           :kind=>"text", 
#=>           :name=>"text", 
#=>           :text=>"Hello "
#=>         },
#=>         {
#=>           :kind=>"element", 
#=>           :name=>"b", 
#=>           :text=>"World", 
#=>           :attr=>[
#=>             {
#=>               :kind=>"attribute", 
#=>               :name=>"m", 
#=>               :nshref=>"foo", 
#=>               :nsprefix=>"z", 
#=>               :text=>"n"
#=>             },
#=>             {
#=>               :kind=>"attribute", 
#=>               :name=>"x", 
#=>               :text=>"y"
#=>             }
#=>           ], 
#=>           :kids=>[
#=>             {
#=>               :kind=>"text", 
#=>               :name=>"text", 
#=>               :text=>"World"
#=>             }
#=>           ]
#=>         },
#=>         {
#=>           :kind=>"text", 
#=>           :name=>"text", 
#=>           :text=>"!"
#=>         }
#=>       ]
#=>     }
#=>   ]
#=> }

Solution 4

Use Nokogiri to parse XML response to ruby hash. It's pretty fast.

doc = Nokogiri::XML(response_body) 
Hash.from_xml(doc.to_s)

Solution 5

I found this while trying to simply convert XML to Hash (not in Rails). I was thinking I would use Nokogiri, but ended up going with Nori.

Then my code was trival:

response_hash = Nori.parse(response)

Other users have pointed out that this does not work. I have not verified, but it seems that the parse method has been moved from the class to the instance. My code above worked at some point. New (unverified) code would be:

response_hash = Nori.new.parse(response)
Share:
47,300
Ivan
Author by

Ivan

Programmer since 1995. Happy Railer for the past 9+ years.

Updated on July 05, 2022

Comments

  • Ivan
    Ivan almost 2 years

    Is there an easy way to convert a Nokogiri XML document to a Hash?

    Something like Rails' Hash.from_xml.

  • Ivan
    Ivan almost 15 years
    Awesome! I just needed to change = strict to = false. Thanks!
  • A.Ali
    A.Ali almost 15 years
    Ah ... Sorry about that, the files I've been working with don't have any attributes (legacy xml !).
  • skrat
    skrat about 13 years
    Nokogiri DOES NOT use libxml-ruby, it uses libxml2, which is a C library.
  • PJP
    PJP over 12 years
    Please explain where from_xml comes from. It isn't a standard Ruby method.
  • ScottJShea
    ScottJShea over 12 years
    @theTinMan from_xml comes from ActiveSupport
  • Dorian
    Dorian almost 12 years
    It comes from here : api.rubyonrails.org/classes/Hash.html#method-c-from_xml, the code is : typecast_xml_value(unrename_keys(ActiveSupport::XmlMini.pars‌​e(xml)))
  • Steffen Roller
    Steffen Roller over 10 years
    that is just terrific!
  • Alesya Huzik
    Alesya Huzik over 9 years
    doc.to_s returns what you already have in response_body, so nokogiri is useless in your example
  • B Seven
    B Seven about 9 years
    I think this is the best solution for apps that do not use Rails.
  • Alexis Rabago Carvajal
    Alexis Rabago Carvajal almost 9 years
    This should be the cleanest answer, +1 to this sire
  • Jesse Whitham
    Jesse Whitham almost 9 years
    @alesguzik is right basically in that statement you are parsing the xml twice Hash.from_xml will use REXML by default not Nokogiri also not sure if you can change this
  • PJP
    PJP almost 9 years
    NOTE: The OP is aware of from_xml and mentions the need for something similar to it. Using from_xml doesn't answer the question. Also, If the document is already a Nokogiri document then don't convert it to a string just to parse it using some other XML parser. Instead, pass the raw XML and ignore parsing with Nokogiri. Anything else is a waste of CPU time.
  • code_dredd
    code_dredd over 8 years
    The unverified line works. However, if you have a Nokogiri::XML document, you must call its to_s method first. E.g. xml = Nokogiri::XML(File.open('file.xml')) and then hash = Nori.new.parse(xml.to_s), but the fields appear to be returned as an Array without the field names.
  • user4887419
    user4887419 almost 8 years
    Nokogiri is sometimes more resilient to parse poorly formed or encoded XMLs. I have examples where Hash.from_xml(xml_str) would fail, but this would still work. So it can be a fallback for Hash.from_xml(xml_str)
  • Albert Rannetsperger
    Albert Rannetsperger almost 8 years
    After banging my head against the wall trying to use Nokogiri I finally came across this. The is BY FAR the best solution! Thanks for the post.
  • mbigras
    mbigras over 7 years
    To add to @theTinMan's comment. It's unnecessary to use Nokogiri to parse the xml, then convert it to a string, then convert it to a hash. If using active_support you can go straight to using Hash::from_xml. For example: Hash.from_xml(File.read('some.xml')) would work
  • pyRabbit
    pyRabbit almost 6 years
    Be aware that the Hash.from_xml function should not be used if accuracy is important. This function starts to fall flat on more complex xml documents completely omitting certain values.
  • konsolebox
    konsolebox over 2 years
    I like that its output attributes are prepended with @.