Reading PASCAL VOC annotations in python
That's a quite easy solution for your problem:
This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename Once I struggled with bndbox tags which where mixed up (ymin, xmin,...) or any other strange combinations, so this code read the tags not only the position.
Finally I updated the code. Thanks to craq and Pritesh Gohil, you were absolutely right.
Hope it helps...
import xml.etree.ElementTree as ET
def read_content(xml_file: str):
tree = ET.parse(xml_file)
root = tree.getroot()
list_with_all_boxes = []
for boxes in root.iter('object'):
filename = root.find('filename').text
ymin, xmin, ymax, xmax = None, None, None, None
ymin = int(boxes.find("bndbox/ymin").text)
xmin = int(boxes.find("bndbox/xmin").text)
ymax = int(boxes.find("bndbox/ymax").text)
xmax = int(boxes.find("bndbox/xmax").text)
list_with_single_boxes = [xmin, ymin, xmax, ymax]
list_with_all_boxes.append(list_with_single_boxes)
return filename, list_with_all_boxes
name, boxes = read_content("file.xml")
Comments
-
Jsevillamol almost 2 years
I have annotations in xml files such as this one, which follows the PASCAL VOC convention:
<annotation> <folder>training</folder> <filename>chanel1.jpg</filename> <source> <database>synthetic initialization</database> <annotation>PASCAL VOC2007</annotation> <image>synthetic</image> <flickrid>none</flickrid> </source> <owner> <flickrid>none</flickrid> <name>none</name> </owner> <size> <width>640</width> <height>427</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>chanel</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>344</xmin> <ymin>10</ymin> <xmax>422</xmax> <ymax>83</ymax> </bndbox> </object> <object> <name>chanel</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>355</xmin> <ymin>165</ymin> <xmax>443</xmax> <ymax>206</ymax> </bndbox> </object> </annotation>
What is the cleanest way of retrieving for example the fields
filename
andbndbox
in Python?I was trying to ElementTree, which seems to be the official Python solution, but I can't make it work.
My code so far:
from xml.etree import ElementTree as ET tree = ET.parse("data/all/annotations/" + file) fn = tree.find('filename').text boxes = tree.findall('bndbox')
this produces
fn == 'chanel1.jpg' boxes == []
So it succesfully extracts the
filename
field, but not thebndbox
'es. -
Waylon Flinn over 4 yearsGreat answer. Very useful. One small correction, bounding box coordinates should be offset by -1 (that is, you need to subtract 1 from coordinates when assigning to variables). References: gluon-cv.mxnet.io/_modules/gluoncv/data/pascal_voc/…
-
craq almost 4 yearshow come you use
boxes.findall
? I would expect just one bounding box per object. -
Pritesh Gohil almost 4 yearsExactly @craq, Second for loop is not required at all. Remove
'for box in boxes.findall("bndbox"):
and modifyymin = int(boxes.find("bndbox/ymin").text)
xmin = int(boxes.find("bndbox/xmin").text)
ymax = int(boxex.find("bndbox/ymax").text)
xmax = int(boxex.find("bndbox/xmax").text)
-
ebk about 3 years@WaylonFlinn Could you please elaborate on why are the -1 offsets needed? The code in the link has no comment for the part.