Decoding YOLOv3 output with Intel OpenVINO’s backend (Part 2)

Let’s get straight to where we left Part 1. We were discussing the about the attributes of the `YoloParams` class.
This log is dumped by log_params
in the class. Another important element in the class definition is self.isYoloV3 = 'mask' in param
. This simply helps us to determine whether the model being used is v3 or not. Actually, the mask
is exclusive to YOLOv3 and tiny version. Previous versions lack it.
After the output layer has been extracted, we have a 3D array filled with mysteriously packed data that is the treasure we seek. The method used to pack has been discussed in the theory part above. We write a parser function that parses/simplifies this and call it parse_yolo_region()
. This function takes in the array full of raw values (let’s call it packed array) and gives out list of all detected objects
. The function does the following. The two output blobs are (1,255,26,26) and (1,255,13,13). Let it be (1,255,side,side) for this blog (the side
attribute is dedicated for this. Look up the definition of the YoloParams
class). The side x side represents the grid and the 255 values are the array we showed earlier.

One method to decode this array for both the layers is:
for oth in range(0, blob.shape[1], 85): # 255
for row in range(blob.shape[2]): # 13
for col in range(blob.shape[3]): # 13
info_per_anchor = blob[0, oth:oth+85, row, col] #print("prob"+str(prob))
x, y, width, height, prob = info_per_anchor[:5]
Next, we find if any of the anchor boxes found an object and if it did, what class was it. There were 80 classes and the one with the highest probability is the answer.
if(prob < threshold):
continue# Now the remaining terms (l+5:l+85) are 80 Classes
class_id = np.argmax(info_per_anchor[5:])
At the threshold confidence of 0.1 or 10%, the classes detected in our test image of the cycle+man, are
person prob:0.19843937456607819
person prob:0.7788506746292114
bicycle prob:0.8749380707740784
bicycle prob:0.8752843737602234
The x and y coordinates obtained are relative to the cell. To get the coordinates with respect to the entire image, we add the grid index and finally normalize the result with the side
parameter.
x = (col + x) / params.side
y = (row + y) / params.side
To relate with above explained example, the commands can be related with the following terms used in the original paper.
bx = (Cx + x) / params.side
by = (Cy + y) / params.side
The aspect ratio or width and height, can be a big number or even negative, so we use exponent to correct it.
try:
width = exp(width)
height = exp(height)
except OverflowError:
continue
These values are already normalised. To get absolute values of width and height, we need to multiply them with their respective anchor width or height and again normalise by the image width or height respectively (fixed to 416x416 for v3 and v3-tiny). Why we do this, wait for it…
size_normalizer = (resized_image_w, resized_image_h) if params.isYoloV3 else (params.side, params.side)
n = int(oth/85)
width = width * params.anchors[2 * n] / size_normalizer[0]
height = height * params.anchors[2 * n + 1] / size_normalizer[1]
To similarly get absolute coordinates of top-left and bottom right point of the box, we use the x
and y
values we determined and use the normalised width and height to get the values. w/2
shifts the point from center of the cell to the left boundary and y/2
shifts it to the upper boundary. Together, they give the top-left corner of the box. To resize these bounding boxes to the original image, we scale it up using the dimensions of the image (w_scale
=h_scale
=416).
xmin = int((x - w / 2) * w_scale)
ymin = int((y - h / 2) * h_scale)
xmax = int(xmin + w * w_scale)
ymax = int(ymin + h * h_scale)
Now, we have the desired observations from the 2 detector layers and we enpack them into objects to get:
In Layer detector/yolo-v3-tiny/Conv_12/BiasAdd/YoloRegion
Detected Objects
{'xmin': 707, 'xmax': 721, 'ymin': 53, 'ymax': 68, 'class_id': 8, 'confidence': 0.0016403508}
In Layer detector/yolo-v3-tiny/Conv_9/BiasAdd/YoloRegion
Detected Objects
{'xmin': 707, 'xmax': 721, 'ymin': 53, 'ymax': 68, 'class_id': 8, 'confidence': 0.0016403508}
{'xmin': 257, 'xmax': 454, 'ymin': 32, 'ymax': 323, 'class_id': 0, 'confidence': 0.29021382}
{'xmin': 247, 'xmax': 470, 'ymin': 31, 'ymax': 373, 'class_id': 0, 'confidence': 0.34315744}
{'xmin': 231, 'xmax': 534, 'ymin': 165, 'ymax': 410, 'class_id': 1, 'confidence': 0.6760541}
{'xmin': 232, 'xmax': 540, 'ymin': 188, 'ymax': 428, 'class_id': 1, 'confidence': 0.23595412}
But there are too many detections for just a single bicycle and person; this is an inherent issue with YOLO which leads to duplicate predictions beacause it is very likely that two or more anchors of same or different cell detect a particular object with different or even same probablities. If we plot all these boxes on the image, we get

To remove these duplicate boxes, we employ Non-Maximal Suppression and Intersection over Union.
NON-MAXIMAL SUPPRESSION:
Let’s not be perplexed with the fancy term. It would have been just fine even if one didn’t know it; we are already familiar with it but not the name. It refers to filtering objects on the basis of confidence.
INTERSECTION OVER UNION (IOU):
If we have two bounding boxes, then, IoU is defined as

IoU = dividing the area of overlap between the bounding boxes by the area of union [source]
It is used for two purposes:
- It helps us benchmark the accuracy of our model predictions. Using it, we can figure out how well does our predicted bounding box overlap with the ground truth bounding box. The higher the IoU, the better the performance. The results can be interpreted as

IoU for performance check
- It helps us remove duplicate bounding boxes for the same object. Exactly the problem that we are facing with the cyclist test case. For, this, we sort all the predictions/objects in descending order of their confidence. If two bounding boxes are pointing to the same object, their IoU would definitely be very high. In this case, we choose the box with higher confidence (i.e., the first box) and reject the second one. If the IoU is very low, this would possibly mean that the two boxes point to different objects of the same class(like different dogs or different cats in the same picture). We use IoU solely for this purpose.
objects = sorted(objects, key=lambda obj : obj['confidence'], reverse=True) for i in range(len(objects)):
if objects[i]['confidence'] == 0:
continue
for j in range(i + 1, len(objects)):
# We perform IOU on objects of same class only
if(objects[i]['class_id'] != objects[j]['class_id']): continue if intersection_over_union(objects[i], objects[j]) > args.iou_threshold:
objects[j]['confidence'] = 0 # Drawing objects with respect to the --prob_threshold CLI parameter
objects = [obj for obj in objects if obj['confidence'] >= args.prob_threshold]
print(f"final objects:{objects}")
where intersection_over_union
is defined as
def intersection_over_union(box_1, box_2):
width_of_overlap_area = min(box_1['xmax'], box_2['xmax']) - max(box_1['xmin'], box_2['xmin'])
height_of_overlap_area = min(box_1['ymax'], box_2['ymax']) - max(box_1['ymin'], box_2['ymin'])
if width_of_overlap_area < 0 or height_of_overlap_area < 0:
area_of_overlap = 0
else:
area_of_overlap = width_of_overlap_area * height_of_overlap_area
box_1_area = (box_1['ymax'] - box_1['ymin']) * (box_1['xmax'] - box_1['xmin'])
box_2_area = (box_2['ymax'] - box_2['ymin']) * (box_2['xmax'] - box_2['xmin'])
area_of_union = box_1_area + box_2_area - area_of_overlap
if area_of_union == 0:
return 0
return area_of_overlap / area_of_union
Post this, we get filtered objects as
final objects:[{'xmin': 231, 'xmax': 534, 'ymin': 165, 'ymax': 410, 'class_id': 1, 'confidence': 0.6760541}, {'xmin': 247, 'xmax': 470, 'ymin': 31, 'ymax': 373, 'class_id': 0, 'confidence': 0.34315744}]
Now, we have good detections; on drawing bounding boxes, we get the following results at the confidence threshold of 0.1 (10%) and IoU threshold of 0.4 (40%):

The entire code used here can be found in my GitHub Repo [HERE]. But I also suggest you look into the demo provided by Intel (Link in references).
I hope this article made sense. Feel free to find discrepancies in the material, I will try my best to correct them and clarify any doubts in it.

If I could help you, you can also send me some crypto like Solana, Ethereum, Doge or coins on BSC. 🤠
SOL wallet: Geyvsojk1KAegziagYAJ5HWaDuAFYKZAS2KpqZeag9DM
BSC wallet: 0x47acC4B652166AE55fb462e4cD7DD26eFa97Da04
ETH wallet: 0x47acC4B652166AE55fb462e4cD7DD26eFa97Da04
Doge wallet: DFT6Pydzw7RAYG1ww4RRfctBKqyiXbbBwk
References
- OpenVINO YOLO Demo: https://github.com/opencv/open_model_zoo/tree/master/demos/python_demos/object_detection_demo_yolov3_async
- Cyclist Image Used: https://unsplash.com/photos/Tzz4XrrdPUE
- Understanding YOLO : https://towardsdatascience.com/dive-really-deep-into-yolo-v3-a-beginners-guide-9e3d2666280e