Access nested children in xml file parsed with ElementTree
Solution 1
Yo have to iter() over your root.
that is root.iter()
would do the trick!
import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
print child.tag, child.attrib
Output:
FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
- To get all tags inside
EstablishmentDetail
you need to find that tag and then loop through its children!
That is, for example.
for child in root.find('.//EstablishmentDetail'):
print child.tag, child.attrib
Output:
FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
- To get the score for
Hygiene
as you've mentioned in comment,
What you have done is, it will get the first Scores
tag and that will have Hygiene, ConfidenceInManagement, Structural tags as child when you call for each in root.find('.//Scores'):rating=child.get('Hygiene')
. That is, obviously all three child will not have the element!
You need to first
- find all Scores
tag.
- find Hygiene
in every tags found!
for each in root.findall('.//Scores'):
rating = each.find('.//Hygiene')
print '' if rating is None else rating.text
Output:
5
5
5
0
5
Solution 2
Hope it could be useful:
import xml.etree.ElementTree as etree
with open('filename.xml') as tmpfile:
doc = etree.iterparse(tmpfile, events=("start", "end"))
doc = iter(doc)
event, root = doc.next()
num = 0
for event, elem in doc:
print event, elem
Related videos on Youtube
FaCoffee
NHL Winnipeg Jets fan - GO JETS GO! Getting a lot of valuable help here. For DOWN VOTERS: if you are going to down vote, tell the recipient why you are doing so - this way he/she can actually make improvements and gain confidence. Improductive critics should be banned.
Updated on July 18, 2022Comments
-
FaCoffee almost 2 years
I am new to xml parsing. This xml file has the following tree:
FHRSEstablishment |--> Header | |--> ... |--> EstablishmentCollection | |--> EstablishmentDetail | | |-->... | |--> Scores | | |-->... |--> EstablishmentCollection | |--> EstablishmentDetail | | |-->... | |--> Scores | | |-->...
but when I access it with ElementTree and look for the
child
tags and attributes,import xml.etree.ElementTree as ET import urllib2 tree = ET.parse( file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i)) root = tree.getroot() for child in root: print child.tag, child.attrib
I only get:
Header {} EstablishmentCollection {}
which I assume means that their attributes are empty. Why is it so, and how can I access the children nested inside
EstablishmentDetail
andScores
?EDIT
Thanks to the answers below I can get inside the tree, but if I want to retrieve values such as those in
Scores
, this fails:for node in root.find('.//EstablishmentDetail/Scores'): rating = node.attrib.get('Hygiene') print rating
and produces
None None None
Why is that?
-
FaCoffee about 7 yearsWow, this was good, but I still struggle to get the ultimate values, such as the scores. If I do
for child in root.find('.//Scores'): rating = child.get('Hygiene'); print rating;
I getNone
as a result. -
verkter about 5 yearsWhat does .// do? Is this a regular expression?
-
user9074332 over 4 years
event, root = doc.next()
AttributeError: 'IterParseIterator' object has no attribute 'next'
-
Andrea over 4 yearsMy script works on python2, for python3 use: event, root = doc.__next__()