Reliability Requirements

In this page, we first provide more details on the parameter estimation method, then to make the page self contain we remind our procedures 1 and 2 used to generate the requirements, next we provide the proof of the Theorem 1, and finally, we show how Procedure 1 can be adapted for compound decomposable evaluation metrics, such as mAP.

Parameter Estimation

To generate reliability requirements for c-tasks, Procedure 1 takes a list of estimated threshold values (tc,tp) for each subtask vi from the human performance data. The thresholds are estimated following ICRAF estimation method. Specifically, we begin with an estimation of the threshold values (tc,tp) for each subtask vi from the human performance data, following the state-of-the-art estimation method (see Section Background). To compute the human-tolerated range of visual changes, state-of-the-art uses the binomial statistical test, which is specific to the task metric. E.g., for the metric Precision-Recall curve PR (used for object detection and instance segmentation), obtained with the human experiment data, the binomial test is performed on each point of the curve, checking if PR for transformed images is below that of the original images with sufficient statistical significance. Since in the empirical studies one does not have the data of all the points of the PR curve, the binomial test is performed only in the curve locations where sufficient data is available. To define binomial test in a point of a PR curve, we need to define two binomial tests, for precision and recall at that point, respectively. Specifically, for each individual binomial test, we follow the procedure of state-of-the-art to estimate the human-tolerated range of visual changes, Δv (see Section Background), and pick the minimum range for the PR curve binomial test, i.e., ΔvPR=min(Δvprec,Δvrec). The resulting tc and tp thresholds for the entire PR curve are obtained by considering the visual change ΔvPR value at the curve point that has been obtained with the largest amount of human data.

Although not shown in the paper, we also conducted experiments with the transformation brightness and obtained thresholds shown below:

thresholds estimated for brightness

Procedure 1

Procedure generating requirements

Procedure 2

Procedure generating subtask requirements

Correctness of our requirement composition

Given a c-task V=vn...v2v1 and a decomposable performance metric MV, such that MV=F(MV) and MV=i=1nmvi, let reqs={ reqvn} be the list of subtask requirements generated by Procedure 1, where reqvi is defined with mvi. Let reqV be the composed c-task requirement defined using MV. The theorem states that if an MVC satisfies subtask requirements reqs, it also satisfies c-task composed requirements req:math:_mathbf{V}`.

Each req can either be correctness-preservation (cp) or prediction-preservation (pp). Let an MVC fV performing a vision task V, a distribution of input images PX and a image transformation TX be given. For a cp requirement defined with a metric ψcp, the required condition is ψcp(fV,fV,PTX,tc)ψcp(fV,fV,PX), where fV is the ground truth function and tc is the human tolerated threshold. For a pp requirement defined with metric ψpp, the required condition is ψpp(fV,PTX,tp)ψpp(fV,PX,ϵ), where tp is the human-tolerated threshold and ϵ is a small value such that PX,ϵ represents minimally transformed images.

Theorem 1

Theorem 1: If all subtask requirements reqvireqs are satisfied then so is the composed c-task requirement reqV.

Proof. Depending on the type of the requirement, reqvi can be cpvi or ppvi. We prove this theorem for both types.

For cp, we want to show that if cpvi is satisfied for all viV, the condition MV(fV,fV,PTX,tV)MV(fV,fV,PX) in cpV is also satisfied, where MV takes as input an MVC fV performing a vision task V, fV is a comparing function, a distribution of input images PX, and a distribution PTX,tV of original and transformed images with visual change tV. The threshold tV for reqV is defined as tV=min(t¯) (Procedure 1,L:9), where t¯ is the list of subtasks’ thresholds. Because each subtask requirement cpvi includes a requirement on smaller thresholds (Procedure 1,LL:5-7), each cpvi includes the condition mvi(fvi,fvi,PTX,tV)mvi(fvi,fvi,PX), where the subtask metric mvi takes as input an MVC fV performing the subtask vi, fvi is a comparing function, and distribution PTX,tV. Specifically, conditions for different thresholds are connected by conjunction in cpvi (Procedure 1), thus, satisfying every cpvi implies mvi(fvi,fvi,PTX,tV)mvi(fvi,fvi,PX) is satisfied for all viV. Then, we have viVmvi(fvi,fvi,PTX,tV)viVmvi(fvi,fvi,PX) since all metric values are positive; therefore, MV(fV,fV,PTX,tV)MV(fV,fV,PX) since MV$isdefinedas$MV=viVmvi (see Metric Decomposition). As a result, cpV is satisfied if all cpvi are satisfied.

Similarly for pp, we want to show that if ppvi is satisfied for all viV, the condition MV(fV,PTX,tV)MV(fV,PTX,ϵ) in ppV is also satisfied. ppvi include the condition mvi(fvi,PTX,tV)mvi(fvi,PTX,ϵ). Specifically, conditions for different thresholds are connected by a conjunction in ppvi (see Procedure 2); thus, satisfying every ppvi implies mvi(fvi,PTX,tV)mvi(fvi,PTX,ϵ) is satisfied for all viV. Then, we have viVmvi(fvi,PTX,tV)viVmvi(fvi,PTX,ϵ) since all metric values are positive; therefore, MV(fV,PTX,tV)MV(fV,PTX,ϵ) since MV is defined as MV=viVmvi (see Metric Decomposition). As a result, ppV is satisfied if all ppvi are satisfied.

Therefore, reqV is satisfied if all reqvi are satisfied.

Procedure for Compound Decomposable Metrics

We provide the following Procedure 3 for generating the reliability requirements for the c-task V and its subtasks using the compound decomposable metrics MVk (see Metric Decomposition). Note that the difference with Procedure 1 are highlighted in purple.

Procedure for Complex Metrics

See the table above for examples of generated correctness-preservation requirements with compound decomposable metrics.

Example requirements with complex metrics

For the correctness of the requirement composition using the compound decomposable metrics, we prove the following theorem.

Theorem 2

Theorem 2: Let V be a c-task V=vn...v2v1 and a MV be a compound decomposable performance metric, such that MV=F(MV1,...,MVk) and each MVk=i=1nmvik is directly decomposable. Let reqVk be the composed requirement generated by Procedure for Compound Decomposable Metrics using MVk, where k[1,K], and reqV be the c-task requirement defined using MV. If all composed requirements reqVk,k[1,N] are satisfied, then so is the c-task requirement reqV.

Proof. Depending on the type of the requirement, reqVk can be cpVk or ppVk. We prove the theorem for both types.

For cp, assume all cpVk are satisfied. We show the required condition MV(fV,fV,PTX,tV)MV(fV,fV,PX) in cpV is satisfied. Since MV=F(M1,...,MK) is positively correlated with each Mk{M1,...,MK}, if the value of each Mk increases, the value of MV increases as well. Satisfying all cpVk suggests Mk(fV,fV,PTX,tVk)Mk(fV,fV,PX) for each Mk and therefore, F(M1(fV,fV,PTX,tV),...,MK(fV,fV,PTX,tV))F(M1(fV,fV,PX),...,MK(fV,fV,PX)) which is MV(fV,fV,PTX,tV)MV(fV,fV,PX). As a result, cpV is satisfied.

For pp, assume all ppVk are satisfied. We show the required condition MV(fV,PTX,tV)MV(fV,PTX,ϵ) in ppV is satisfied. Since MV=F(M1,...,MK) is positively correlated with each Mk{M1,...,MK}, if the value of each Mk increases, the value of MV increases as well. Satisfying all ppVk suggests Mk(fV,PTX,tVk)Mk(fV,PTX,ϵ) for each Mk and therefore, F(M1(fV,PTX,tV),...,MK(fV,PTX,tV))F(M1(fV,PTX,ϵ),...,MK(fV,PTX,ϵ)) which is MV(fV,PTX,tV)MV(fV,PTX,tV). As a result, ppV is satisfied.

Therefore, satisfying all reqVk implies satisfying reqV.