Object Detection Metrics and Non-Maximum Suppression (NMS)

The most often used statistic for assessing the effectiveness of object identification algorithms is Average Precision (AP), sometimes known as Mean Average Precision (mAP). It provides a single metric to compare several models, measuring the average accuracy across all categories. There is no differentiation between AP and mAP in the COCO dataset. This measure will be referred to as AP for the remainder of this essay.
The datasets used for training and benchmarking in YOLOv1 and YOLOv2 were PASCAL VOC 2007 and VOC 2012, respectively . The dataset used in YOLOv3 and later is Microsoft COCO (Common Objects in Context). These datasets use a different method for calculating the AP. The purpose of AP and how it is calculated are covered in the sections that follow.

How does AP work?

Handling multiple object categories, defining a positive prediction with Intersection over Union (IoU), and precision-recall metrics form the foundation of the AP metric. Accuracy and Review: Recall is the proportion of actual positive cases that the model correctly identifies, while precision is the accuracy of the model’s positive predictions. Precision and recall frequently face a trade-off; For instance, increasing the number of detected objects, which increases recall, can increase the number of false positives and decrease precision. The AP metric incorporates the precision-recall curve, which plots recall against precision for various confidence thresholds, to account for this trade-off. By taking into account the area under the precision-recall curve, this metric provides a balanced evaluation of both precision and recall. Managing a variety of object types: Object discovery models should recognize and restrict different article classifications in a picture. This is addressed by the AP metric, which is also referred to as mean average precision because it takes the mean of the average precision (AP) across all categories after calculating each category’s AP separately. A more comprehensive evaluation of the model’s overall performance is provided by this strategy, which ensures that the model’s performance is evaluated separately for each category. Convergence over Association: Predicting bounding boxes is the goal of object detection, which aims to precisely locate objects in images. The Intersection over Union (IoU) measure is incorporated into the AP metric to evaluate the quality of the predicted bounding boxes. The ratio of the predicted bounding box’s intersection area to the ground truth bounding box’s union area is called IoU. It measures the amount of overlap between the predicted bounding boxes and the ground truth. The COCO benchmark considers numerous IoU limits to assess the model’s exhibition at various degrees of confinement exactness.

Non-Maximum Suppression (NMS)

Non-Maximum Suppression (NMS) is a post-processing technique used in object detection algorithms to reduce the number of overlapping bounding boxes and improve the overall detection quality. Object detection algorithms typically generate multiple bounding boxes around the same object with different confidence scores. NMS filters out redundant and irrelevant bounding boxes, keeping only the most accurate ones. Algorithm 1 describes the procedure. Figure 3 shows the typical output of an object detection model containing multiple overlapping bounding boxes and the output after NMS.