Convert a Nokogiri document to a Ruby Hash
Solution 1
I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML (http://github.com/Empact/roxml/tree) which maps xml elements to ruby objects; it is built atop libxml.
# USAGE: Hash.from_libxml(YOUR_XML_STRING)
require 'xml/libxml'
# adapted from
# http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0
class Hash
class << self
def from_libxml(xml, strict=true)
begin
XML.default_load_external_dtd = false
XML.default_pedantic_parser = strict
result = XML::Parser.string(xml).parse
return { result.root.name.to_s => xml_node_to_hash(result.root)}
rescue Exception => e
# raise your custom exception here
end
end
def xml_node_to_hash(node)
# If we are at the root of the document, start the hash
if node.element?
if node.children?
result_hash = {}
node.each_child do |child|
result = xml_node_to_hash(child)
if child.name == "text"
if !child.next? and !child.prev?
return result
end
elsif result_hash[child.name.to_sym]
if result_hash[child.name.to_sym].is_a?(Object::Array)
result_hash[child.name.to_sym] << result
else
result_hash[child.name.to_sym] = [result_hash[child.name.to_sym]] << result
end
else
result_hash[child.name.to_sym] = result
end
end
return result_hash
else
return nil
end
else
return node.content.to_s
end
end
end
end
Solution 2
If you want to convert a Nokogiri XML document to a hash, just do the following:
require 'active_support/core_ext/hash/conversions'
hash = Hash.from_xml(nokogiri_document.to_s)
Solution 3
Here's a far simpler version that creates a robust Hash that includes namespace information, both for elements and attributes:
require 'nokogiri'
class Nokogiri::XML::Node
TYPENAMES = {1=>'element',2=>'attribute',3=>'text',4=>'cdata',8=>'comment'}
def to_hash
{kind:TYPENAMES[node_type],name:name}.tap do |h|
h.merge! nshref:namespace.href, nsprefix:namespace.prefix if namespace
h.merge! text:text
h.merge! attr:attribute_nodes.map(&:to_hash) if element?
h.merge! kids:children.map(&:to_hash) if element?
end
end
end
class Nokogiri::XML::Document
def to_hash; root.to_hash; end
end
Seen in action:
xml = '<r a="b" xmlns:z="foo"><z:a>Hello <b z:m="n" x="y">World</b>!</z:a></r>'
doc = Nokogiri::XML(xml)
p doc.to_hash
#=> {
#=> :kind=>"element",
#=> :name=>"r",
#=> :text=>"Hello World!",
#=> :attr=>[
#=> {
#=> :kind=>"attribute",
#=> :name=>"a",
#=> :text=>"b"
#=> }
#=> ],
#=> :kids=>[
#=> {
#=> :kind=>"element",
#=> :name=>"a",
#=> :nshref=>"foo",
#=> :nsprefix=>"z",
#=> :text=>"Hello World!",
#=> :attr=>[],
#=> :kids=>[
#=> {
#=> :kind=>"text",
#=> :name=>"text",
#=> :text=>"Hello "
#=> },
#=> {
#=> :kind=>"element",
#=> :name=>"b",
#=> :text=>"World",
#=> :attr=>[
#=> {
#=> :kind=>"attribute",
#=> :name=>"m",
#=> :nshref=>"foo",
#=> :nsprefix=>"z",
#=> :text=>"n"
#=> },
#=> {
#=> :kind=>"attribute",
#=> :name=>"x",
#=> :text=>"y"
#=> }
#=> ],
#=> :kids=>[
#=> {
#=> :kind=>"text",
#=> :name=>"text",
#=> :text=>"World"
#=> }
#=> ]
#=> },
#=> {
#=> :kind=>"text",
#=> :name=>"text",
#=> :text=>"!"
#=> }
#=> ]
#=> }
#=> ]
#=> }
Solution 4
Use Nokogiri to parse XML response to ruby hash. It's pretty fast.
doc = Nokogiri::XML(response_body)
Hash.from_xml(doc.to_s)
Solution 5
I found this while trying to simply convert XML to Hash (not in Rails). I was thinking I would use Nokogiri, but ended up going with Nori.
Then my code was trival:
response_hash = Nori.parse(response)
Other users have pointed out that this does not work. I have not verified, but it seems that the parse method has been moved from the class to the instance. My code above worked at some point. New (unverified) code would be:
response_hash = Nori.new.parse(response)
Comments
-
Ivan almost 2 years
Is there an easy way to convert a Nokogiri XML document to a Hash?
Something like Rails'
Hash.from_xml
. -
Ivan almost 15 yearsAwesome! I just needed to change
= strict
to= false
. Thanks! -
A.Ali almost 15 yearsAh ... Sorry about that, the files I've been working with don't have any attributes (legacy xml !).
-
skrat about 13 yearsNokogiri DOES NOT use libxml-ruby, it uses libxml2, which is a C library.
-
PJP over 12 yearsPlease explain where
from_xml
comes from. It isn't a standard Ruby method. -
ScottJShea over 12 years@theTinMan from_xml comes from ActiveSupport
-
Dorian almost 12 yearsIt comes from here : api.rubyonrails.org/classes/Hash.html#method-c-from_xml, the code is :
typecast_xml_value(unrename_keys(ActiveSupport::XmlMini.parse(xml)))
-
Steffen Roller over 10 yearsthat is just terrific!
-
Alesya Huzik over 9 years
doc.to_s
returns what you already have inresponse_body
, so nokogiri is useless in your example -
B Seven about 9 yearsI think this is the best solution for apps that do not use Rails.
-
Alexis Rabago Carvajal almost 9 yearsThis should be the cleanest answer, +1 to this sire
-
Jesse Whitham almost 9 years@alesguzik is right basically in that statement you are parsing the xml twice Hash.from_xml will use REXML by default not Nokogiri also not sure if you can change this
-
PJP almost 9 yearsNOTE: The OP is aware of
from_xml
and mentions the need for something similar to it. Usingfrom_xml
doesn't answer the question. Also, If the document is already a Nokogiri document then don't convert it to a string just to parse it using some other XML parser. Instead, pass the raw XML and ignore parsing with Nokogiri. Anything else is a waste of CPU time. -
code_dredd over 8 yearsThe unverified line works. However, if you have a
Nokogiri::XML
document, you must call itsto_s
method first. E.g.xml = Nokogiri::XML(File.open('file.xml'))
and thenhash = Nori.new.parse(xml.to_s)
, but the fields appear to be returned as anArray
without the field names. -
user4887419 almost 8 yearsNokogiri is sometimes more resilient to parse poorly formed or encoded XMLs. I have examples where Hash.from_xml(xml_str) would fail, but this would still work. So it can be a fallback for Hash.from_xml(xml_str)
-
Albert Rannetsperger almost 8 yearsAfter banging my head against the wall trying to use Nokogiri I finally came across this. The is BY FAR the best solution! Thanks for the post.
-
mbigras over 7 yearsTo add to @theTinMan's comment. It's unnecessary to use Nokogiri to parse the xml, then convert it to a string, then convert it to a hash. If using
active_support
you can go straight to usingHash::from_xml
. For example:Hash.from_xml(File.read('some.xml'))
would work -
pyRabbit almost 6 yearsBe aware that the
Hash.from_xml
function should not be used if accuracy is important. This function starts to fall flat on more complex xml documents completely omitting certain values. -
konsolebox over 2 yearsI like that its output attributes are prepended with
@
.