Reading PASCAL VOC annotations in python

13,035

That's a quite easy solution for your problem:

This will return your box coordinates in a nested list [xmin, ymin, xmax, ymax] and the filename Once I struggled with bndbox tags which where mixed up (ymin, xmin,...) or any other strange combinations, so this code read the tags not only the position.

Finally I updated the code. Thanks to craq and Pritesh Gohil, you were absolutely right.

Hope it helps...

import xml.etree.ElementTree as ET


def read_content(xml_file: str):

    tree = ET.parse(xml_file)
    root = tree.getroot()

    list_with_all_boxes = []

    for boxes in root.iter('object'):

        filename = root.find('filename').text

        ymin, xmin, ymax, xmax = None, None, None, None

        ymin = int(boxes.find("bndbox/ymin").text)
        xmin = int(boxes.find("bndbox/xmin").text)
        ymax = int(boxes.find("bndbox/ymax").text)
        xmax = int(boxes.find("bndbox/xmax").text)

        list_with_single_boxes = [xmin, ymin, xmax, ymax]
        list_with_all_boxes.append(list_with_single_boxes)

    return filename, list_with_all_boxes

name, boxes = read_content("file.xml")
Share:
13,035
Jsevillamol
Author by

Jsevillamol

Background in Mathematics and Computer Engineering.

Updated on June 18, 2022

Comments

  • Jsevillamol
    Jsevillamol almost 2 years

    I have annotations in xml files such as this one, which follows the PASCAL VOC convention:

    <annotation>
    <folder>training</folder>
    <filename>chanel1.jpg</filename>
    <source>
    <database>synthetic initialization</database>
    <annotation>PASCAL VOC2007</annotation>
    <image>synthetic</image>
    <flickrid>none</flickrid>
    </source>
    <owner>
    <flickrid>none</flickrid>
    <name>none</name>
    </owner>
    <size>
    <width>640</width>
    <height>427</height>
    <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
    <name>chanel</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
    <xmin>344</xmin>
    <ymin>10</ymin>
    <xmax>422</xmax>
    <ymax>83</ymax>
    </bndbox>
    </object>
    <object>
    <name>chanel</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
    <xmin>355</xmin>
    <ymin>165</ymin>
    <xmax>443</xmax>
    <ymax>206</ymax>
    </bndbox>
    </object>
    </annotation>
    

    What is the cleanest way of retrieving for example the fields filename and bndbox in Python?

    I was trying to ElementTree, which seems to be the official Python solution, but I can't make it work.

    My code so far:

    from xml.etree import ElementTree as ET
    tree = ET.parse("data/all/annotations/" + file)
    fn = tree.find('filename').text
    boxes = tree.findall('bndbox')
    

    this produces

    fn == 'chanel1.jpg'
    boxes == []
    

    So it succesfully extracts the filename field, but not the bndbox'es.

  • Waylon Flinn
    Waylon Flinn over 4 years
    Great answer. Very useful. One small correction, bounding box coordinates should be offset by -1 (that is, you need to subtract 1 from coordinates when assigning to variables). References: gluon-cv.mxnet.io/_modules/gluoncv/data/pascal_voc/…
  • craq
    craq almost 4 years
    how come you use boxes.findall? I would expect just one bounding box per object.
  • Pritesh Gohil
    Pritesh Gohil almost 4 years
    Exactly @craq, Second for loop is not required at all. Remove'for box in boxes.findall("bndbox"): and modify ymin = int(boxes.find("bndbox/ymin").text) xmin = int(boxes.find("bndbox/xmin").text) ymax = int(boxex.find("bndbox/ymax").text) xmax = int(boxex.find("bndbox/xmax").text)
  • ebk
    ebk about 3 years
    @WaylonFlinn Could you please elaborate on why are the -1 offsets needed? The code in the link has no comment for the part.