Get list of XML attribute values in Python
Solution 1
I'm not really an old hand at Python, but here's an XPath solution using libxml2.
import libxml2
DOC = """<elements>
<parent name="CategoryA">
<child value="a1"/>
<child value="a2"/>
<child value="a3"/>
</parent>
<parent name="CategoryB">
<child value="b1"/>
<child value="b2"/>
<child value="b3"/>
</parent>
</elements>"""
doc = libxml2.parseDoc(DOC)
def getValues(cat):
return [attr.content for attr in doc.xpathEval("/elements/parent[@name='%s']/child/@value" % (cat))]
print getValues("CategoryA")
With result...
['a1', 'a2', 'a3']
Solution 2
ElementTree 1.3 (unfortunately not 1.2 which is the one included with Python) supports XPath like this:
import elementtree.ElementTree as xml
def getValues(tree, category):
parent = tree.find(".//parent[@name='%s']" % category)
return [child.get('value') for child in parent]
Then you can do
>>> tree = xml.parse('data.xml')
>>> getValues(tree, 'CategoryA')
['a1', 'a2', 'a3']
>>> getValues(tree, 'CategoryB')
['b1', 'b2', 'b3']
lxml.etree
(which also provides the ElementTree interface) will also work in the same way.
Solution 3
You can do this with BeautifulSoup
>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup(xml)
>>> def getValues(name):
. . . return [child['value'] for child in soup.find('parent', attrs={'name': name}).findAll('child')]
If you're doing work with HTML/XML I would recommend you take a look at BeautifulSoup. It's similar to the DOM tree but contains more functionality.
Solution 4
Using a standard W3 DOM such as the stdlib's minidom, or pxdom:
def getValues(category):
for parent in document.getElementsByTagName('parent'):
if parent.getAttribute('name')==category:
return [
el.getAttribute('value')
for el in parent.getElementsByTagName('child')
]
raise ValueError('parent not found')
Solution 5
My preferred python xml library is lxml , which wraps libxml2.
Xpath does seem the way to go here, so I'd write this as something like:
from lxml import etree
def getValues(xml, category):
return [x.attrib['value'] for x in
xml.findall('/parent[@name="%s"]/*' % category)]
xml = etree.parse(open('filename.xml'))
>>> print getValues(xml, 'CategoryA')
['a1', 'a2', 'a3']
>>> print getValues(xml, 'CategoryB')
['b1', 'b2', 'b3]
roomaroo
Updated on December 30, 2020Comments
-
roomaroo over 3 years
I need to get a list of attribute values from child elements in Python.
It's easiest to explain with an example.
Given some XML like this:
<elements> <parent name="CategoryA"> <child value="a1"/> <child value="a2"/> <child value="a3"/> </parent> <parent name="CategoryB"> <child value="b1"/> <child value="b2"/> <child value="b3"/> </parent> </elements>
I want to be able to do something like:
>>> getValues("CategoryA") ['a1', 'a2', 'a3'] >>> getValues("CategoryB") ['b1', 'b2', 'b3']
It looks like a job for XPath but I'm open to all recommendations. I'd also like to hear about your favourite Python XML libraries.