Get the bounding box coordinates in the TensorFlow object detection API tutorial

44,878

Solution 1

I tried printing output_dict['detection_boxes'] but I am not sure what the numbers mean

You can check out the code for yourself. visualize_boxes_and_labels_on_image_array is defined here.

Note that you are passing use_normalized_coordinates=True. If you trace the function calls, you will see your numbers [ 0.56213236, 0.2780568 , 0.91445708, 0.69120586] etc. are the values [ymin, xmin, ymax, xmax] where the image coordinates:

(left, right, top, bottom) = (xmin * im_width, xmax * im_width, 
                              ymin * im_height, ymax * im_height)

are computed by the function:

def draw_bounding_box_on_image(image,
                           ymin,
                           xmin,
                           ymax,
                           xmax,
                           color='red',
                           thickness=4,
                           display_str_list=(),
                           use_normalized_coordinates=True):
  """Adds a bounding box to an image.
  Bounding box coordinates can be specified in either absolute (pixel) or
  normalized coordinates by setting the use_normalized_coordinates argument.
  Each string in display_str_list is displayed on a separate line above the
  bounding box in black text on a rectangle filled with the input 'color'.
  If the top of the bounding box extends to the edge of the image, the strings
  are displayed below the bounding box.
  Args:
    image: a PIL.Image object.
    ymin: ymin of bounding box.
    xmin: xmin of bounding box.
    ymax: ymax of bounding box.
    xmax: xmax of bounding box.
    color: color to draw bounding box. Default is red.
    thickness: line thickness. Default value is 4.
    display_str_list: list of strings to display in box
                      (each to be shown on its own line).
    use_normalized_coordinates: If True (default), treat coordinates
      ymin, xmin, ymax, xmax as relative to the image.  Otherwise treat
      coordinates as absolute.
  """
  draw = ImageDraw.Draw(image)
  im_width, im_height = image.size
  if use_normalized_coordinates:
    (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
                                  ymin * im_height, ymax * im_height)

Solution 2

I've got exactly the same story. Got an array with roughly hundred boxes (output_dict['detection_boxes']) when only one was displayed on an image. Digging deeper into code which is drawing a rectangle was able to extract that and use in my inference.py:

#so detection has happened and you've got output_dict as a
# result of your inference

# then assume you've got this in your inference.py in order to draw rectangles
vis_util.visualize_boxes_and_labels_on_image_array(
    image_np,
    output_dict['detection_boxes'],
    output_dict['detection_classes'],
    output_dict['detection_scores'],
    category_index,
    instance_masks=output_dict.get('detection_masks'),
    use_normalized_coordinates=True,
    line_thickness=8)

# This is the way I'm getting my coordinates
boxes = output_dict['detection_boxes']
# get all boxes from an array
max_boxes_to_draw = boxes.shape[0]
# get scores to get a threshold
scores = output_dict['detection_scores']
# this is set as a default but feel free to adjust it to your needs
min_score_thresh=.5
# iterate over all objects found
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    # 
    if scores is None or scores[i] > min_score_thresh:
        # boxes[i] is the box which will be drawn
        class_name = category_index[output_dict['detection_classes'][i]]['name']
        print ("This box is gonna get used", boxes[i], output_dict['detection_classes'][i])

Solution 3

The above answer did not work for me, I had to do some changes. So if that doesn't help maybe try this.

# This is the way I'm getting my coordinates
boxes = detections['detection_boxes'].numpy()[0]
# get all boxes from an array
max_boxes_to_draw = boxes.shape[0]
# get scores to get a threshold
scores = detections['detection_scores'].numpy()[0]
# this is set as a default but feel free to adjust it to your needs
min_score_thresh=.5
# # iterate over all objects found
coordinates = []
for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores[i] > min_score_thresh:
        class_id = int(detections['detection_classes'].numpy()[0][i] + 1)
        coordinates.append({
            "box": boxes[i],
            "class_name": category_index[class_id]["name"],
            "score": scores[i]
        })


print(coordinates)

Here each item(dictionary) in the coordinates list, is a box to be drawn on the image with boxes coordinates (normalised), class_name and score.

Share:
44,878

Related videos on Youtube

Mandy
Author by

Mandy

Updated on November 30, 2021

Comments

  • Mandy
    Mandy over 2 years

    I am new to both Python and Tensorflow. I am trying to run the object detection tutorial file from the Tensorflow Object Detection API, but I cannot find where I can get the coordinates of the bounding boxes when objects are detected.

    Relevant code:

     # The following processing is only for single image
     detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
     detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
    

    The place where I assume bounding boxes are drawn is like this:

     # Visualization of the results of detection.
     vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          output_dict['detection_boxes'],
          output_dict['detection_classes'],
          output_dict['detection_scores'],
          category_index,
          instance_masks=output_dict.get('detection_masks'),
          use_normalized_coordinates=True,
          line_thickness=8)
     plt.figure(figsize=IMAGE_SIZE)
     plt.imshow(image_np)
    

    I tried printing output_dict['detection_boxes'] but I am not sure what the numbers mean. There are a lot.

    array([[ 0.56213236,  0.2780568 ,  0.91445708,  0.69120586],
           [ 0.56261235,  0.86368728,  0.59286624,  0.8893863 ],
           [ 0.57073039,  0.87096912,  0.61292225,  0.90354401],
           [ 0.51422435,  0.78449738,  0.53994244,  0.79437423],
    ......
    
           [ 0.32784131,  0.5461576 ,  0.36972913,  0.56903434],
           [ 0.03005961,  0.02714229,  0.47211722,  0.44683522],
           [ 0.43143299, 0.09211366,  0.58121657,  0.3509962 ]], dtype=float32)
    

    I found answers for similar questions, but I don't have a variable called boxes as they do. How can I get the coordinates?

  • Mandy
    Mandy about 6 years
    Okay. It seems that output_dict['detection_boxes'] contains all the overlapping boxes, and that's why there are so many arrays.Thank you!
  • CMCDragonkai
    CMCDragonkai about 6 years
    What determines how many overlapping boxes there are? And also why are there so many overlapping boxes, why is this passed to the visualisation layer to merge?
  • Web Nexus
    Web Nexus almost 5 years
    I know this is an old question, but I thought this may help somebody. You can limit the number of overlapping boxes if you increase the min_score_thresh within the visualize_boxes_and_labels_on_image_array function input variables. By default it is set to 0.5, for my project for example, I have had to increase this to 0.8.
  • tpk
    tpk over 2 years
    The normalised bboxes are the format - ymin, xmin, ymax, xmax github.com/tensorflow/models/blob/…
  • Kirikkayis
    Kirikkayis about 2 years
    I get the following error: ---> 32 boxes = detections['detection_boxes'].numpy()[0] AttributeError: 'numpy.ndarray' object has no attribute 'numpy'
  • Shreyas Vedpathak
    Shreyas Vedpathak about 2 years
    @Kirikkayis that means your variable is already a NumPy array.