Reliability Requirements

In this page, we first provide more details on the parameter estimation method, then to make the page self contain we remind our procedures 1 and 2 used to generate the requirements, next we provide the proof of the Theorem 1, and finally, we show how Procedure 1 can be adapted for compound decomposable evaluation metrics, such as mAP.

Parameter Estimation

To generate reliability requirements for c-tasks, Procedure 1 takes a list of estimated threshold values ( $t_{c}, t_{p}$ ) for each subtask $v_{i}$ from the human performance data. The thresholds are estimated following ICRAF estimation method. Specifically, we begin with an estimation of the threshold values ( $t_{c}, t_{p}$ ) for each subtask $v_{i}$ from the human performance data, following the state-of-the-art estimation method (see Section Background). To compute the human-tolerated range of visual changes, state-of-the-art uses the binomial statistical test, which is specific to the task metric. E.g., for the metric Precision-Recall curve $P R$ (used for object detection and instance segmentation), obtained with the human experiment data, the binomial test is performed on each point of the curve, checking if $P R$ for transformed images is below that of the original images with sufficient statistical significance. Since in the empirical studies one does not have the data of all the points of the $P R$ curve, the binomial test is performed only in the curve locations where sufficient data is available. To define binomial test in a point of a $P R$ curve, we need to define two binomial tests, for precision and recall at that point, respectively. Specifically, for each individual binomial test, we follow the procedure of state-of-the-art to estimate the human-tolerated range of visual changes, $Δ_{v}$ (see Section Background), and pick the minimum range for the $P R$ curve binomial test, i.e., $Δ_{v}^{P R} = m i n (Δ_{v}^{p r e c}, Δ_{v}^{r e c})$ . The resulting $t_{c}$ and $t_{p}$ thresholds for the entire $P R$ curve are obtained by considering the visual change $Δ_{v}^{P R}$ value at the curve point that has been obtained with the largest amount of human data.

Although not shown in the paper, we also conducted experiments with the transformation brightness and obtained thresholds shown below:

Procedure 1

Procedure 2

Procedure generating subtask requirements

Correctness of our requirement composition

Given a c-task $V = v_{n} ⊙ . . . v_{2} ⊙ v_{1}$ and a decomposable performance metric $M_{V}$ , such that $M_{V} = F (M_{V}^{'})$ and $M_{V}^{'} = \prod_{i = 1}^{n} m_{v_{i}}$ , let reqs $= {$ req $_{v_{n}}}$ be the list of subtask requirements generated by Procedure 1, where req $_{v_{i}}$ is defined with $m_{v_{i}}$ . Let req $_{V}$ be the composed c-task requirement defined using $M_{V}^{'}$ . The theorem states that if an MVC satisfies subtask requirements reqs, it also satisfies c-task composed requirements req:math:_mathbf{V}`.

Each req can either be correctness-preservation (cp) or prediction-preservation ( $p p$ ). Let an MVC $f_{V}$ performing a vision task $V$ , a distribution of input images $P_{X}$ and a image transformation $T_{X}$ be given. For a $c p$ requirement defined with a metric $ψ^{c p}$ , the required condition is $ψ_{c p} (f_{V}, f_{V}^{*}, P_{T_{X, t_{c}}}) \geq ψ_{c p} (f_{V}, f_{V}^{*}, P_{X})$ , where $f_{V}^{*}$ is the ground truth function and $t_{c}$ is the human tolerated threshold. For a $p p$ requirement defined with metric $ψ^{p p}$ , the required condition is $ψ_{p p} (f_{V}, P_{T_{X, t_{p}}}) \geq ψ_{p p} (f_{V}, P_{X, ϵ})$ , where $t_{p}$ is the human-tolerated threshold and $ϵ$ is a small value such that $P_{X, ϵ}$ represents minimally transformed images.

Theorem 1

Theorem 1: If all subtask requirements req $_{v_{i}} \in reqs$ are satisfied then so is the composed c-task requirement req $_{V}$ .

Proof. Depending on the type of the requirement, req $_{v_{i}}$ can be $c p_{v_{i}}$ or $p p_{v_{i}}$ . We prove this theorem for both types.

For $c p$ , we want to show that if $c p_{v_{i}}$ is satisfied for all $v_{i} \in V$ , the condition $M_{V}^{'} (f_{V}, f_{V}^{*}, P_{T_{X, t_{V}}}) \geq M_{V}^{'} (f_{V}, f_{V}^{*}, P_{X})$ in $c p_{V}^{'}$ is also satisfied, where $M_{V}$ takes as input an MVC $f_{V}$ performing a vision task $V$ , $f_{V}^{*}$ is a comparing function, a distribution of input images $P_{X}$ , and a distribution $P_{T_{X, t_{V}}}$ of original and transformed images with visual change $\leq t_{V}$ . The threshold $t_{V}$ for req $_{V}$ is defined as $t_{V} = min (\bar{t})$ (Procedure 1,L:9), where $\bar{t}$ is the list of subtasks’ thresholds. Because each subtask requirement $c p_{v_{i}}$ includes a requirement on smaller thresholds (Procedure 1,LL:5-7), each $c p_{v_{i}}$ includes the condition $m_{v_{i}} (f_{v_{i}}, f_{v_{i}}^{*}, P_{T_{X, t_{V}}}) \geq m_{v_{i}} (f_{v_{i}}, f_{v_{i}}^{*}, P_{X})$ , where the subtask metric $m_{v_{i}}$ takes as input an MVC $f_{V}$ performing the subtask $v_{i}$ , $f_{v_{i}}^{*}$ is a comparing function, and distribution $P_{T_{X, t_{V}}}$ . Specifically, conditions for different thresholds are connected by conjunction in $c p_{v_{i}}$ (Procedure 1), thus, satisfying every $c p_{v_{i}}$ implies $m_{v_{i}} (f_{v_{i}}, f_{v_{i}}^{*}, P_{T_{X, t_{V}}}) \geq m_{v_{i}} (f_{v_{i}}, f_{v_{i}}^{*}, P_{X})$ is satisfied for all $v_{i} \in V$ . Then, we have $\prod_{v_{i} \in V} m_{v_{i}} (f_{v_{i}}, f_{v_{i}}^{*}, P_{T_{X, t_{V}}}) \geq \prod_{v_{i} \in V} m_{v_{i}} (f_{v_{i}}, f_{v_{i}}^{*}, P_{X})$ since all metric values are positive; therefore, $M_{V}^{'} (f_{V}, f_{V}^{*}, P_{T_{X, t_{V}}}) \geq M_{V}^{'} (f_{V}, f_{V}^{*}, P_{X})$ since $M_{V}^{'} $ i s d e f i n e d a s $ M_{V}^{'} = \prod_{v_{i} \in V} m_{v_{i}}$ (see Metric Decomposition). As a result, $c p_{V}^{'}$ is satisfied if all $c p_{v_{i}}$ are satisfied.

Similarly for $p p$ , we want to show that if $p p_{v_{i}}$ is satisfied for all $v_{i} \in V$ , the condition $M_{V}^{'} (f_{V}, P_{T_{X, t_{V}}}) \geq M_{V}^{'} (f_{V}, P_{T_{X, ϵ}})$ in $p p_{V}^{'}$ is also satisfied. $p p_{v_{i}}$ include the condition $m_{v_{i}} (f_{v_{i}}, P_{T_{X, t_{V}}}) \geq m_{v_{i}} (f_{v_{i}}, P_{T_{X, ϵ}})$ . Specifically, conditions for different thresholds are connected by a conjunction in $p p_{v_{i}}$ (see Procedure 2); thus, satisfying every $p p_{v_{i}}$ implies $m_{v_{i}} (f_{v_{i}}, P_{T_{X, t_{V}}}) \geq m_{v_{i}} (f_{v_{i}}, P_{T_{X, ϵ}})$ is satisfied for all $v_{i} \in V$ . Then, we have $\prod_{v_{i} \in V} m_{v_{i}} (f_{v_{i}}, P_{T_{X, t_{V}}}) \geq \prod_{v_{i} \in V} m_{v_{i}} (f_{v_{i}}, P_{T_{X, ϵ}})$ since all metric values are positive; therefore, $M_{V}^{'} (f_{V}, P_{T_{X, t_{V}}}) \geq M_{V}^{'} (f_{V}, P_{T_{X, ϵ}})$ since $M_{V}^{'}$ is defined as $M_{V}^{'} = \prod_{v_{i} \in V} m_{v_{i}}$ (see Metric Decomposition). As a result, $p p_{V}^{'}$ is satisfied if all $p p_{v_{i}}$ are satisfied.

Therefore, req $_{V}$ is satisfied if all req $_{v_{i}}$ are satisfied.

Procedure for Compound Decomposable Metrics

We provide the following Procedure 3 for generating the reliability requirements for the c-task $V$ and its subtasks using the compound decomposable metrics $M_{V}^{k}$ (see Metric Decomposition). Note that the difference with Procedure 1 are highlighted in purple.

See the table above for examples of generated correctness-preservation requirements with compound decomposable metrics.

Example requirements with complex metrics

For the correctness of the requirement composition using the compound decomposable metrics, we prove the following theorem.

Theorem 2

Theorem 2: Let $V$ be a c-task $V = v_{n} ⊙ . . . v_{2} ⊙ v_{1}$ and a $M_{V}$ be a compound decomposable performance metric, such that $M_{V} = F (M_{V}^{1}, . . ., M_{V}^{k})$ and each $M_{V}^{k} = \prod_{i = 1}^{n} m_{v_{i}}^{k}$ is directly decomposable. Let req $_{V}^{k}$ be the composed requirement generated by Procedure for Compound Decomposable Metrics using $M_{V}^{k}$ , where $k \in [1, K]$ , and req $_{V}$ be the c-task requirement defined using $M_{V}$ . If all composed requirements req $_{V}^{k}, k \in [1, N]$ are satisfied, then so is the c-task requirement req $_{V}$ .

Proof. Depending on the type of the requirement, req $_{V}^{k}$ can be $c p_{V}^{k}$ or $p p_{V}^{k}$ . We prove the theorem for both types.

For $c p$ , assume all $c p_{V}^{k}$ are satisfied. We show the required condition $M_{V} (f_{V}, f_{V}^{*}, P_{T_{X, t_{V}}}) \geq M_{V} (f_{V}, f_{V}^{*}, P_{X})$ in $c p_{V}$ is satisfied. Since $M_{V} = F (M^{1}, . . ., M^{K})$ is positively correlated with each $M^{k} \in {M^{1}, . . ., M^{K}}$ , if the value of each $M^{k}$ increases, the value of $M_{V}$ increases as well. Satisfying all $c p_{V}^{k}$ suggests $M^{k} (f_{V}, f_{V}^{*}, P_{T_{X, t_{V}^{k}}}) \geq M^{k} (f_{V}, f_{V}^{*}, P_{X})$ for each $M^{k}$ and therefore, $F (M^{1} (f_{V}, f_{V}^{*}, P_{T_{X, t_{V}}}), . . ., M^{K} (f_{V}, f_{V}^{*}, P_{T_{X, t_{V}}})) \geq F (M^{1} (f_{V}, f_{V}^{*}, P_{X}), . . ., M^{K} (f_{V}, f_{V}^{*}, P_{X}))$ which is $M_{V} (f_{V}, f_{V}^{*}, P_{T_{X, t_{V}}}) \geq M_{V} (f_{V}, f_{V}^{*}, P_{X})$ . As a result, $c p_{V}$ is satisfied.

For $p p$ , assume all $p p_{V}^{k}$ are satisfied. We show the required condition $M_{V} (f_{V}, P_{T_{X, t_{V}}}) \geq M_{V} (f_{V}, P_{T_{X, ϵ}})$ in $p p_{V}$ is satisfied. Since $M_{V} = F (M^{1}, . . ., M^{K})$ is positively correlated with each $M^{k} \in {M^{1}, . . ., M^{K}}$ , if the value of each $M^{k}$ increases, the value of $M_{V}$ increases as well. Satisfying all $p p_{V}^{k}$ suggests $M^{k} (f_{V}, P_{T_{X, t_{V}^{k}}}) \geq M^{k} (f_{V}, P_{T_{X, ϵ}})$ for each $M^{k}$ and therefore, $F (M^{1} (f_{V}, P_{T_{X, t_{V}}}), . . ., M^{K} (f_{V}, P_{T_{X, t_{V}}})) \geq F (M^{1} (f_{V}, P_{T_{X, ϵ}}), . . ., M^{K} (f_{V}, P_{T_{X, ϵ}}))$ which is $M_{V} (f_{V}, P_{T_{X, t_{V}}}) \geq M_{V} (f_{V}, P_{T_{X, t_{V}}})$ . As a result, $p p_{V}$ is satisfied.

Therefore, satisfying all $r e q_{V}^{k}$ implies satisfying $r e q_{V}$ .