Convert XML to dictionary in Python using lxml
Solution 1
Personally I like xmltodict
from here. With pip you can install it like so pip install xmltodict
.
Note that this actually creates OrderedDict
objects. Example usage:
import xmltodict as xd
with open('test.xml','r') as f:
d = xd.parse(f)
Solution 2
I found a solution in this gist: https://gist.github.com/jacobian/795571
def elem2dict(node):
"""
Convert an lxml.etree node tree into a dict.
"""
result = {}
for element in node.iterchildren():
# Remove namespace prefix
key = element.tag.split('}')[1] if '}' in element.tag else element.tag
# Process element as tree element if the inner XML contains non-whitespace content
if element.text and element.text.strip():
value = element.text
else:
value = elem2dict(element)
if key in result:
if type(result[key]) is list:
result[key].append(value)
else:
tempvalue = result[key].copy()
result[key] = [tempvalue, value]
else:
result[key] = value
return result
Related videos on Youtube
proximous
Updated on September 15, 2022Comments
-
proximous about 1 year
There seem to be lots of solutions on StackOverflow for converting XML to a Python dictionary, but none of them generate the output I'm looking for. I have the following XML:
<?xml version="1.0" encoding="UTF-8"?> <status xmlns:mystatus="http://localhost/mystatus"> <section1 mystatus:field1="data1" mystatus:field2="data2" /> <section2 mystatus:lineA="outputA" mystatus:lineB="outputB" /> </status>
lxml has an elegantly simple solution for converting XML to a dictionary:
def recursive_dict(element): return element.tag, dict(map(recursive_dict, element)) or element.text
Unfortunately, I get:
('status', {'section2': None, 'section1': None})
instead of:
('status', {'section2': {'field1':'data1','field2':'data2'}, 'section1': {'lineA':'outputA','lineB':'outputB'} })
I can't figure out how to get my desired output without greatly complicating the recursive_dict() function.
I'm not tied to lxml, and I'm also fine with a different organization of the dictionary, as long as it gives me all the info in the xml. Thanks!
-
GreenAsJade about 9 yearsOut of curiosity, why do you expect to get the attributes of sections, but not the attributes of the status? What magic lets the library know that's what you want? And ... is there some reason why the contents of sections are attributes and not elements?
-
GreenAsJade about 9 yearsIt sounds like you have a solution, but I just want to note that the desired output you showed does not show ALL the information captured. It shows the attributes of sections captured, but not the attributes of status.
-
-
proximous about 9 yearsAlthough I'd prefer to not need to install anything extra, this is very simple and looks like it will work so I'll give a try. Thanks!
-
proximous about 9 yearsThis works great! I prefer to reformat it with d=ast.literal_eval(json.dumps(d)) after the parse but the default output preserves everything perfect for me! Thanks!
-
AlexanderLedovsky over 7 yearsBe aware of xmltodict when working with big xmls. xmltodict is using python xml module from standard library. It becomes very slow when xml > 1 GB. Instead use lxml