Metric Decomposition
In this page, we first provide the proof for the metric decomposition used in the paper, and we define Compound Decomposable Metrics for more complex metrics.
Decomposable Metrics
Below, for a complex vision task, , that can be represented as a sequential composition of the subtasks, i.e., , we remind the metric compositionality definition.
,
where is a directly decomposable metric of the task , is a metric of the i-th subtask, and is a monotonic function.
Decomposition of AP
Below, we show that AP is decomposable.
First, AP for a class is the AuC of the PR-curve. Therefore, by the definition of decomposable metrics, if each PR-curve is directly decomposable, AP is decomposable. Each point on the PR-curve is defined with precision and recall for a class label given a and a IoU threshold . For object detection, for all detected objects such that the confidence score conf, where represents the ground truth class and is the max of the bounding box compared to all ground truth boxes, we can represent Precision and Recall as: Precision Recall .
Thus, Precision and Recall can be decomposed as follows:
Note that is the condition for a detection to be a true positive.
We can see that for Precision, is the percentage of bounding boxes matched ground truth out of all output boxes of this class, which is precision for ; is the percentage of correct labels out of all bounding boxes matched ground truth of this class, which is precision for . Similarly for Recall, is recall for and is recall for . Since both Precision and Recall are decomposable, each point on the PR-curve is decomposable, the PR-curve for each class can be decomposed into the two subtasks, i.e., PR PR. As a result, AP for object detection is decomposable following Metric Decomposition.
Similarly, AP for instance segmentation is decomposable because we can decompose Precision and Recall. For a segmented object , let be the max of the area enclosed by compared to all ground truth segmentation, be the max of compared to all tight bounding boxes around areas enclosed by ground truth segmentation. Precision and Recall can be decomposed as the following:
Following the decomposition of Precision and Recall , we can decompose PR-curves for instance segmentation, i.e., PR PR PR PR . AP used for instance segmentation can also be decomposed following Metric Decomposition.
Compound Decomposable Metrics
Some metrics, such as mean Average Precision (mAP), are more complex and are not decomposable according to our decomposition definition. mAP is defined as an average of AP for each class label c; therefore, mAP can be represented as a function of the precision-recall curve, PR, that is directly decomposable.
For such metrics , we extend the decomposable metric definition into compound decomposable as follows:
,
where is a decomposable metric of the task , is a metric of the i-th subtask, and is a function that is monotonic with respect to every argument.
Now we show that mAP is decomposable. Given all class labels , mAP is the average of each AP value of . Since AP for each is decomposable as we showed above, mAP is also decomposable: mAP AP AP .