Additional Evaluation Results

In addition to the image transformation adding artificial frost shown in the paper, we also conducted an additional set of evaluation using the image transformation changing brightness, we also conducted a set of evaluation for the class bus. Because of page limit of the paper, we show them here.

RQ1

Comparison of the human tolerated threshold estimated either directly or by composing the subtask thresholds. $t_{c}$ and $t_{p}$ are for correctness-preservation ( $c p$ ) and prediction-preservation ( $p p$ ), respectively.

As we can see in the image, our composed threshold is always the lowerbound threshold.

RQ2

In the following table, we have for the transformation changing brightness, and for the class bus, the comparison of reliability evaluation of object detection and instance segmentation MVCs with our checking method using the SoTa benchmark dataset PASCAL VOC-C [PASCALVOC-C].

PASCALVOC-C: Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming link.