Family Wise Error Rate (FWER)
False Discovery Rate (FDR)
For a model \({\cal M}\), suppose we have \(m\) null hypotheses \[ H_{0,i} \subset {\cal M}, \qquad 1 \leq i \leq m \]
We also assume we have \(p\)-values \(p_i, 1 \leq i \leq m\) (i.e. [0,1] valued random variables) such that \[ P_F(p_i \leq t) \leq t \qquad \forall t \in [0,1], F \in H_{0,i} \]
A multiple testing procedure \({\cal T}\) is a map \({\cal T}:[0,1]^m \to \{0,1\}^m\) with 0 denoting a declared null and a 1 a declared positive.
Common procedure: thresholding at threshold \(t\) \[ {\cal T}_t(p_1, \dots, p_m)[i] = 1_{[0,t]}(p_i) \]
## [1] 7680
| True null | True positive | Decisions | |
| Declared positive | \(V({\cal T}, F)\) | \(S({\cal T}, F)\) | \(R({\cal T})\) |
| Declared null | \(U({\cal T}, F)\) | \(T({\cal T}, F)\) | \(m-R({\cal T})\) |
| \(m_0(F)\) | \(m_1(F)\) | \(m\) |
For \(F \in {\cal M}\)
\[ \newcommand{\fwer}{\text{FWER}} \newcommand{\fdr}{\text{FDR}} \newcommand{\fdp}{\text{FDP}} \]
\[ \fwer(F, {\cal T}) = P_F(V({\cal T}) \geq 1) = P_F(V({\cal T}, F) \geq 1) \]
Does this address / permit / facilitate inference after selection?
After running \({\cal T}\) can we talk about confidence intervals? point estimators? posterior distributions?
Broadly speaking, seems the answer is no.
Suppose \[ H_{0,i} = \left\{F: \theta_i(F) = 0\right\}, \qquad 1 \leq i \leq m \] for \(m\) parameters \(\theta_i:{\cal M} \rightarrow \mathbb{R}\).
In this case, some procedures \({\cal T}\) can be used to contruct confidence intervals for \(\theta_i(F)\) with a simultaneous property.
Ignoring asymptotic considerations for now, we might restrict attention to multivariate normal \[ {\cal M} = {\cal M}(\Sigma) = \left\{N(\theta, \Sigma): \theta \in \mathbb{R}^m\right\}$ \] and take \[ p_i = 2 \cdot \left(1 - \Phi\left(|Z_i| \right) \right) \] assuming \(\text{diag}(\Sigma)=1\) so we can think of \[ Z_i = \frac{\widehat{\theta}_i}{\text{SE}(\widehat{\theta}_i)} \]
Simplest procedure to control \[\fwer(F, {\cal T}) \leq \alpha, \qquad \forall F \in {\cal M}\] is Bonferroni: \({\cal T}_{\alpha / m}\).
Proof: By Markov inequality \[ \begin{aligned} P_F(V({\cal T}_{\alpha / m}, F) \geq 1)& \leq E_F(V({\cal T}_{\alpha / m}, F)) \\ & = E_F\left( \# \{1 \leq i \leq m: H_{0,i}(F)=0, p_i \leq \alpha / m \} \right) \\ &= \sum_{i:H_{0,i}(F)=0} P_F(p_i \leq \alpha / m) \\ &= \alpha \frac{m_0(F)}{m} \\ & \leq \alpha \end{aligned} \]
Can spend \(\alpha\) over \(m\) hypotheses differently by choosing \(\sum_{i=1}^m \alpha_i = \alpha\).
Works for any multiple comparisons problem, only requires valid \(p\)-values \((p_i)_{1 \leq i \leq m}\).
## [1] 10
In the multivariate normal setting \(F \equiv \theta\) and \[ P_F(p_i \leq \alpha / m) = P_F(|Z_i| \geq z_{1 - \alpha / (2m)}) \] with \(z_q={\tt qnorm}(q)\).
Hypothesis \(i\) is rejected if \(100 (1 - \alpha / m)\) % confidence interval does not cover 0.
In particular under \(\theta=0\) (global null) \[ V({\cal T}, 0) = \left\{i: {\cal T}[i] = 1\right\} \] and \[ P_0\left( \exists i: |Z_i| \geq z_{1 - \alpha / (2m)}\right) = \fwer({\cal T}, 0) \leq \alpha \]
Define \[ \tilde{Z}(\theta_0)_i = \frac{\widehat{\theta}_i - \theta_{0,i}}{\text{SE}(\widehat{\theta}_i)}. \]
Under \(F=\theta_0\) we have \(\tilde{Z}(\theta_0)\) has the same distribution as \(Z\) under \(F=0\).
\(\implies\) for every \(\theta \in \mathbb{R}^m\) \[ \begin{aligned} \alpha & \geq P_{0} \left(\exists i: |Z|_i \geq z_{1 - \alpha / (2m)} \right)\\ &= P_{\theta} \left(\exists i: |\tilde{Z}(\theta)|_i \geq z_{1 - \alpha / (2m)} \right)\\ & = P_{\theta} \left( \exists i: \theta_i \not \in \widehat{\theta}_i \pm \text{SE}(\widehat{\theta}_i) \cdot z_{1 -\alpha/(2m)} \right) \\ \end{aligned} \]
\(\iff\) \[ P_{\theta} \left(\cap_{1 \leq i \leq m}\{ \theta_i \in \widehat{\theta}_i \pm \text{SE}(\widehat{\theta}_i) \cdot z_{1 -\alpha/(2m)}\} \right) \geq 1 - \alpha \]
We call the region \[ \prod_{i=1}^m \widehat{\theta}_i \pm \text{SE}(\widehat{\theta}_i) z_{1 - \alpha / (2m)} \] a \(100(1-\alpha)\) % simultaneous confidence region.
Exercise: in the multivariate normal setting, suppose \({\cal T}\) is such that \[ \fwer({\cal T}, 0) \leq \alpha \] then \[ \left\{\theta: {\cal T}(\tilde{Z}(\theta)) \equiv 0 \right\} \] is a \(100(1-\alpha)\) % simultaneous confidence region. That is, \({\cal T}\) need only control FWER in the weak sense to construct a simultaneous confidence region.
Simultaneous intervals can be found by projecting a simultaneous region onto coordinate axes (not necessarily an easy task). Bonferroni intervals had advantage that these projections are simple.
Tail of the normal: for \(z > 0\) \[ \frac{e^{-z^2/2}}{\sqrt{2\pi}} \cdot \left(\frac{1}{z} - \frac{1}{z^3}\right) \leq 1 - \Phi(z) \leq \frac{e^{-z^2/2}}{\sqrt{2\pi}} \cdot \frac{1}{z} \]
The quantile \(z_{1 - \alpha / (2m)}\) solves \[ \frac{\alpha}{2m} = 1 - \Phi(z_{1 - \alpha / (2m)}) \implies z_{1 - \alpha / 2m} \approx \sqrt{2 \log m} \]
-ËśExercise: Show that for any \(\epsilon > 0\) \[ 1 - \Phi((1 + \epsilon) \sqrt{2 \log m}) \overset{m \to \infty}{\to} 0, m \left( 1 - \Phi((1 - \epsilon) \sqrt{2 \log m}) \right) \overset{m \to \infty}{\to} \infty. \] Argue that \[\frac{z_{1 - \alpha/(2m)}}{\sqrt{2 \log m}} \to 1.\] Would it change things if we had used one-sided \(p\)-values?
## [1] 5.326724 5.256522
Order the \(p\)-values to produce \(p_{(1)} \leq p_{(2)} \leq \dots \leq p_{(m)}\).
Set \[ \hat{i}(p) = \min\left\{i \geq 1: p_{(i)} > \frac {\alpha}{m + 1 - i} \right\} \] with \(\min(\emptyset)=0\) or set \(p_{(0)}=0\) and start at \(i=0\).
Reject all \(H_{0,i}\) with \(1 \leq i < \hat{i}(p)\).
First run Bonferroni on all \(m\) hypotheses.
If smallest \(p\)-value is rejected, move on to next best \(m-1\) hypotheses and run Bonferroni on them.
Continue until the smallest \(p\)-value of such a Bonferroni is not rejected.
## [1] 10
Suppose a true null is rejected, then there is some smallest index \(i\) when this happens: \(i_0(p, F)\).
The first \(i_0(p,F)\) were false nulls so \(m-m_0(F) \geq i_0(p, F) - 1\) and \[ \frac{\alpha}{m - i_0(p,F)+1} \leq \frac{\alpha}{m_0(F)} \] and hence there was a true null \(p\)-value less than \(\alpha / m_0(F)\).
By usual Bonferroni this happens with probability less than \(\alpha\).
An example of a closed testing procedure, which is a way to construct an FWER controlling procedure from tests of intersection nulls \[ H_{0,I} = \cap_{i \in I} H_{0,i} \] each at level \(\alpha\).
Holm procedure is closure of the Bonferroni testing procedure on each intersection null.
Benefits from a fast way to test each intersection null.
Given a sequence of thresholds \(q_i\), a step-down procedure rejects all \(i\) less than \[ \hat{i}^{\downarrow}_q(p) = \min\left\{i: p_{(i)} > q_i \right\}. \]
A step-up procedure rejects all \(i\) less than \[ \hat{i}^{\uparrow}_q(p) = \max\left\{i: p_{(i)} \leq q_i \right\}. \]
Note that \(\hat{i}^{\downarrow}_q(p) \leq \hat{i}^{\uparrow}_q(p)\) so for the same thresholds \(q\) \(\hat{i}^{\uparrow}_q\), is more powerful than \(\hat{i}^{\downarrow}_q\).
Hochberg is the step-up version of Holm.
Under independence Hochberg’s procedure controls FWER.
Definition \[ k-\fwer({\cal T}, F) = P_F(V({\cal T}, F) \geq k) \]
Threshold \({\cal T}_{\alpha \cdot k/m}\) controls \(k-\fwer\). (Markov inequality again.)
Step down procedure with thresholds \[ q_i = \frac{k \cdot \alpha}{m + k - i} \] also controls \(k-\fwer\). (Almost identical proof to Holm)
Thresholds are identical to Holm when \(k=1\) (usual FWER).
In your application / field, what are the costs to false positives?
Can you tolerate any?
Definition (Benjamini and Hochberg, 1995) \[ \fdr({\cal T}, F) = E_F\left(\frac{V({\cal T}, F)}{\max(R({\cal T}, F), 1)}\right) \]
Expectation of False Discovery Proportion (FDP)
\[ \fdp({\cal T}, F) = \frac{V({\cal T}, F)}{\max(R({\cal T}, F), 1)} \]
Note \[ \fdr({\cal T}, F) \leq \fwer({\cal T}, F) \]
\(\implies\) control of FDR at \(\alpha\) yields more discoveries.
Suppose \[ \fdr({\cal T}_{t'_{\alpha}}, F) = \alpha, \qquad \fwer({\cal T}_{t_{\alpha}}, F) = \alpha. \] with \(t'_{\alpha} > t_{\alpha}\).
Then true discoveries \(T({\cal T}_{t'_{\alpha}}) \geq T({\cal T}_{t_{\alpha}}).\)
Suppose now \(m \to \infty\) with \(m_0/m \to \pi_0\).
For fixed threshold \(t\) \[ \frac{1}{m} V({\cal T}_t,F) \to \pi_0 \cdot t \]
Under reasonable assumptions we might also expect \[ \frac{1}{m} R({\cal T}_t, F) \to \mathbb{G}(t) \] for some distribution \(\mathbb{G}\) on \([0,1]\).
As \[ \frac{1}{m} R({\cal T}_t, F) = \frac{1}{m} V({\cal T}_t, F) + \frac{1}{m} S({\cal T}_t, F) \] we can write \[ \mathbb{G}(t) = \pi_0 \cdot t + (1 - \pi_0) \cdot \mathbb{G}_1(t) \]
In this setting \[ \begin{aligned} \fdp({\cal T}_t, F) &\to \frac{\pi_0 \cdot t}{\mathbb{G}(t)} \\ \fdr({\cal T}_t, F) &\to \frac{\pi_0 \cdot t}{\mathbb{G}(t)} \\ \end{aligned} \]
If we knew \(\mathbb{G}\) (and \(\pi_0\)), we might pick a threshold \(t\) such that \[ \frac{\pi_0 \cdot t}{\mathbb{G}(t)} \leq \alpha \]
As higher thresholds lead to more true rejections we would like to take \[ t_{\alpha}(\mathbb{G}, \pi_0) = \max\left\{t: \frac{\pi_0 \cdot t}{\mathbb{G}(t)} \leq \alpha \right\} \]
We can conservatively estimate limit (recall \(\pi_0 < 1\)) with \[ \widehat{\fdr}(t) = \frac{t}{\widehat{G}(t)} \] with \(\widehat{G}(t)\) the empirical CDF of the \(p\)-values.
Seems reasonable to take \[ \widehat{t}_{\alpha} = \sup\left\{t: \widehat{\fdr}(t) \leq \alpha \right\}. \]
Equivalent to Benjamini-Hochberg (BH) step-up procedure \[ \hat{i}^{\uparrow}_q(p) \] with \[ q_i = \alpha \frac{i}{m}. \]
## [1] 18
At \(\widehat{t}_{\alpha}\) \[ \widehat{G}(t) = \frac{t}{\alpha} \]
\(\implies\) \[ \fdp({\cal T}_{\widehat{t}_{\alpha}}, F) = \alpha \frac{V({\cal T}_{\widehat{t}_{\alpha}}, F)}{m \cdot \widehat{t}_{\alpha}} \]
If we make the assumption that the true null \(p\)-values are independent (amongst themselves and of the true positive \(p\)-values) for all \(F \in {\cal M}\) then \[ \frac{V({\cal T}_t, F)}{t} \] is a (backward) martingale in a filtration in which \(\widehat{t}_{\alpha}\) is a stopping time.
Optional stopping implies \[ \fdr({\cal T}_{\widehat{t}_{\alpha}}, F) = E_F\left[ \alpha \frac{V({\cal T}_{\widehat{t}_{\alpha}}, F)}{m \cdot \widehat{t}_{\alpha}}\right] = E_F\left[ \alpha \frac{V({\cal T}_{1}, F)}{m \cdot 1}\right] = \frac{\alpha \cdot m_0(F)}{m} \leq \alpha. \]
In asymptotic setting, essentially enough to have uniform consistency of \(\widehat{G}\) to something, i.e. \(\mathbb{G}\).
Idealized asymptotic picture of FDR led to the model for the \(Z\) scores as IID draws \[ \begin{aligned} \theta_i &\sim \pi_0 \cdot \delta_0 + (1 - \pi_0) G \\ Z_i | \theta_i &\sim N(\theta, 1) \end{aligned} \]
Models parameters of \(F\) as a mixture of 0’s and \(G\).
The full parameters of \(F\) are lost and with \(m\) sufficiently large \[ \fdr({\cal T}_t, F) \approx \frac{1}{\pi_0} P(\theta = 0 | Z > z_{1 - t/2}) \overset{\text{def}}{=} \frac{1}{\pi_0} p\fdr(t) = \text{Fdr}(t) \]
Can also make sense of (switching to \(z\) scale) \[ \text{fdr}(z) = P(\theta = 0 | Z = z) = \frac{\pi_0 \cdot \phi(z)}{N(0,1) \star ( \pi_0 \cdot \delta_0 + (1 - \pi_0) \cdot G )} \]
We can estimate \(\text{fdr}(z)\) using a density estimate \(\widehat{f}\) of the \(Z\) scores and an estimate of \(\pi_0\) \[ \widehat{\text{fdr}}(z) = \frac{\widehat{\pi}_0 \cdot \phi(z)}{\widehat{f}(z)}. \]
## [1] "fdr" "fp0" "Efdr" "cdf1" "mat" "z.2" "call"
## [1] 13
## [1] 17
## [1] NA 3.808781
Two groups model is an example of empirical Bayes / compound decision problem.
Strict control of FDR is not guaranteed.
Introduces problem of estimation (e.g. \(\widehat{\pi}_0\) and \(\widehat{f}\)). Efron (2011)
Does provide improved estimator via Tweedie’s formula Efron (2011) \[ \widehat{\theta}(z) = E[\theta | Z=z] = z + l'(z) \] with \[ l'(z) = \frac{d}{dz} \log f(z) \] using plugin \(\widehat{l} = \log \widehat{f}(z)\) for some density estimate \(\widehat{f}\).
Yields estimates \[ \widehat{\theta}_i(Z) = \widehat{\theta}_i(Z_i) = Z_i + \hat{l}'(Z_i). \]
If willing / able to estimate density of \(G\) it is possible to contruct posterior interval. BUT, estimate of \(G\) is hard – a deconvolution problem.
Not clear such intervals would have any coverage property.
Exercise: Suppose \(G = N(\mu_1, \sigma^2_1)\). Derive posterior intervals for \(\theta_i | Z_i\). What is their coverage rate? How does this relate to the Bernstein von Mises phenomenon?
## [1] "fdr" "fp0" "Efdr" "cdf1" "mat" "z.2" "call"
## delta sigma p0
## thest 0.00000000 1.00000000 1.211336060
## theSD 0.00000000 0.00000000 0.011414290
## mlest -0.11597111 0.75373650 0.934174674
## mleSD 0.01228385 0.01483100 0.008933865
## cmest -0.08303908 0.71176630 0.894938035
## cmeSD 0.01385561 0.01252345 0.007288211
## [1] 73
## [1] 160
## [1] -2.471104 2.227792
Consider a confidence interval procedure \({\cal I}:\mathbb{R}^m \rightarrow \prod_{i=1}^m {\cal B}(\mathbb{R})\) based on a point estimator \(\widehat{\theta} \in \mathbb{R}^m\)
A parameter is not reported if its interval is \((-\infty, \infty)\)
| Does not cover | Covers | Decisions | |
| Reported | \(V({\cal I}, F)\) | \(S({\cal I}, F)\) | \(R({\cal I})\) |
| Not reported | \(U({\cal I}, F)\) | \(T({\cal I}, F)\) | \(m-R({\cal I})\) |
| \(m_0(F)\) | \(m_1(F)\) | \(m\) |
A multiple comparisons procedure \({\cal T}\) combined with a confidence level \(\alpha'\) can be combined to form a confidence interval procedure: return \(\widehat{\theta}_i \pm \text{SE}(\widehat{\theta}_i) z_{1 - \alpha'/2}\) for each \(i\) rejected by \({\cal T}\).
For example, \({\cal I}\) might run BH and decide to report intervals for those selected by BH.
The False Coverage Rate (FCR) is \[ E_F\left[\frac{V({\cal I}, F)}{\max(R({\cal I}, F), 1)}\right] \]
For \({\cal I}\) constructed from \({\cal T}\) any error-rate control of \({\cal T}\) is not necessarily related to FCR of corresponding \({\cal I}\).
A procedure \({\cal T}\) is simple if, for every \(1 \leq i \leq m\) on the event \(i\) is reported \[ R({\cal T}) = R({\cal T}, \widehat{\theta}) = 1 + R_{-i,\cal T}(\widehat{\theta}_{-i}) \]
Typically, \(R_{-i}\) will not depend on \(i\) for symmetric procedures. BH and Bonferroni are simple procedures.
For simple procedures and independent \(\widehat{\theta}\), Benjamini and Yekutieli (2005) show that $100(1-)% $ Bonferroni intervals with \(m / R({\cal T})\) comparisons control FCR at \(\alpha\).
Analogous result for non-simple just harder to state.
Exercise: Run the Benjamini-Hochberg procedure at level 0.05 on hivdata and report Benjamini and Yekutieli’s FCR controlling intervals. Compare to Bonferroni simultaneous intervals.
Covered some aspects of FWER, FDR controlling procedures.
FWER procedures can be used to construct simultaneous confidence intervals.
Two groups model for large scale inference: asymptotic FDR.
What error rate to use?