|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2022-12-13 12:36:00 |
| 9 | + |
| 10 | +title: "F-test for multiple linear regression using contrast-based inference" |
| 11 | +chapter: "Statistical Models" |
| 12 | +section: "Univariate normal data" |
| 13 | +topic: "Multiple linear regression" |
| 14 | +theorem: "Contrast-based F-test" |
| 15 | + |
| 16 | +sources: |
| 17 | + - authors: "Stephan, Klaas Enno" |
| 18 | + year: 2010 |
| 19 | + title: "Classical (frequentist) inference" |
| 20 | + in: "Methods and models for fMRI data analysis in neuroeconomics" |
| 21 | + pages: "Lecture 4, Slides 23/25" |
| 22 | + url: "http://www.socialbehavior.uzh.ch/teaching/methodsspring10.html" |
| 23 | + - authors: "Koch, Karl-Rudolf" |
| 24 | + year: 2007 |
| 25 | + title: "Multivariate Distributions" |
| 26 | + in: "Introduction to Bayesian Statistics" |
| 27 | + pages: "Springer, Berlin/Heidelberg, 2007, ch. 2.5, eqs. 2.202, 2.213, 2.211" |
| 28 | + url: "https://www.springer.com/de/book/9783540727231" |
| 29 | + doi: "10.1007/978-3-540-72726-2" |
| 30 | + - authors: "jld" |
| 31 | + year: 2018 |
| 32 | + title: "Understanding t-test for linear regression" |
| 33 | + in: "StackExchange CrossValidated" |
| 34 | + pages: "retrieved on 2022-12-13" |
| 35 | + url: "https://stats.stackexchange.com/a/344008" |
| 36 | + - authors: "Penny, William" |
| 37 | + year: 2006 |
| 38 | + title: "Comparing nested GLMs" |
| 39 | + in: "Mathematics for Brain Imaging" |
| 40 | + pages: "ch. 2.3, pp. 51-52, eq. 2.9" |
| 41 | + url: "https://ueapsylabs.co.uk/sites/wpenny/mbi/mbi_course.pdf" |
| 42 | + |
| 43 | +proof_id: "P392" |
| 44 | +shortcut: "mlr-f" |
| 45 | +username: "JoramSoch" |
| 46 | +--- |
| 47 | + |
| 48 | + |
| 49 | +**Theorem:** Consider a [linear regression model](/D/mlr) |
| 50 | + |
| 51 | +$$ \label{eq:mlr} |
| 52 | +y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V) |
| 53 | +$$ |
| 54 | + |
| 55 | +and an [F-contrast](/D/fcon) on the model parameters |
| 56 | + |
| 57 | +$$ \label{eq:fcon} |
| 58 | +\gamma = C^\mathrm{T} \beta \quad \text{where} \quad C \in \mathbb{R}^{p \times q} \; . |
| 59 | +$$ |
| 60 | + |
| 61 | +Then, the [test statistic](/D/tstat) |
| 62 | + |
| 63 | +$$ \label{eq:mlr-f} |
| 64 | +F = \hat{\beta}^\mathrm{T} C \left( \hat{\sigma}^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} C^\mathrm{T} \hat{\beta} / q |
| 65 | +$$ |
| 66 | + |
| 67 | +with the [parameter estimates](/P/mlr-mle) |
| 68 | + |
| 69 | +$$ \label{eq:mlr-est} |
| 70 | +\begin{split} |
| 71 | +\hat{\beta} &= (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} y \\ |
| 72 | +\hat{\sigma}^2 &= \frac{1}{n-p} (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta}) |
| 73 | +\end{split} |
| 74 | +$$ |
| 75 | + |
| 76 | +follows an [F-distribution](/D/f) |
| 77 | + |
| 78 | +$$ \label{eq:mlr-f-dist} |
| 79 | +F \sim \mathrm{F}(q, n-p) |
| 80 | +$$ |
| 81 | + |
| 82 | +under the [null hypothesis](/D/h0) |
| 83 | + |
| 84 | +$$ \label{eq:mlr-f-h0} |
| 85 | +\begin{split} |
| 86 | +H_0: &\; \gamma_1 = 0 \wedge \ldots \wedge \gamma_q = 0 \\ |
| 87 | +H_1: &\; \gamma_1 \neq 0 \vee \ldots \vee \gamma_q \neq 0 \; . |
| 88 | +\end{split} |
| 89 | +$$ |
| 90 | + |
| 91 | + |
| 92 | +**Proof:** |
| 93 | + |
| 94 | +1) We know that [the estimated regression coefficients in linear regression follow a multivariate normal distribution](/P/mlr-wlsdist): |
| 95 | + |
| 96 | +$$ \label{eq:b-est-dist} |
| 97 | +\hat{\beta} \sim \mathcal{N}\left( \beta, \, \sigma^2 (X^\mathrm{T} V^{-1} X)^{-1} \right) \; . |
| 98 | +$$ |
| 99 | + |
| 100 | +Thus, the [estimated contrast vector](/D/tcon) $\hat{\gamma} = C^\mathrm{T} \hat{\beta}$ is also [distributed according to a multivariate normal distribution](/P/mvn-ltt): |
| 101 | + |
| 102 | +$$ \label{eq:g-est-dist-cond} |
| 103 | +\hat{\gamma} \sim \mathcal{N}\left( C^\mathrm{T} \beta, \, \sigma^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right) \; . |
| 104 | +$$ |
| 105 | + |
| 106 | +Substituting the noise variance $\sigma^2$ with the noise precision $\tau = 1/\sigma^2$, we can also write this down as a [conditional distribution](/D/dist-cond): |
| 107 | + |
| 108 | +$$ \label{eq:g-est-tau-dist-cond} |
| 109 | +\hat{\gamma} \vert \tau \sim \mathcal{N}\left( C^\mathrm{T} \beta, (\tau Q)^{-1} \right) \quad \text{with} \quad Q = \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \; . |
| 110 | +$$ |
| 111 | + |
| 112 | +2) We also know that the [residual sum of squares](/D/rss), divided the [true error variance](/D/mlr) |
| 113 | + |
| 114 | +$$ \label{eq:mlr-rss} |
| 115 | +\frac{1}{\sigma^2} \sum_{i=1}^{n} \hat{\varepsilon}_i^2 = \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} = \frac{1}{\sigma^2} (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta}) |
| 116 | +$$ |
| 117 | + |
| 118 | +[is following a chi-squared distribution](/P/mlr-rssdist): |
| 119 | + |
| 120 | +$$ \label{eq:mlr-rss-dist} |
| 121 | +\frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} = \tau \, \hat{\varepsilon}^\mathrm{T} \hat{\varepsilon} \sim \chi^2(n-p) \; . |
| 122 | +$$ |
| 123 | + |
| 124 | +The [chi-squared distribution is related to the gamma distribution](/P/gam-chi2) in the following way: |
| 125 | + |
| 126 | +$$ \label{eq:gam-chi2} |
| 127 | +X \sim \chi^2(k) \quad \Rightarrow \quad cX \sim \mathrm{Gam}\left( \frac{k}{2}, \frac{1}{2c} \right) \; . |
| 128 | +$$ |
| 129 | + |
| 130 | +Thus, applying \eqref{eq:gam-chi2} to \eqref{eq:mlr-rss-dist}, we obtain the [marginal distribution](/D/dist-marg) of $\tau$ as: |
| 131 | + |
| 132 | +$$ \label{eq:tau-dist} |
| 133 | +\frac{1}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} \left( \tau \, \hat{\varepsilon}^\mathrm{T} \hat{\varepsilon} \right) = \tau \sim \mathrm{Gam}\left( \frac{n-p}{2}, \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{2} \right) \; . |
| 134 | +$$ |
| 135 | + |
| 136 | +3) Note that the [joint distribution](/D/dist-joint) of $\hat{\gamma}$ and $\tau$ is, following from \eqref{eq:g-est-tau-dist-cond} and \eqref{eq:tau-dist} and [by definition, a normal-gamma distribution](/D/ng): |
| 137 | + |
| 138 | +$$ \label{eq:g-est-tau-dist-joint} |
| 139 | +\hat{\gamma}, \tau \sim \mathrm{NG}\left( C^\mathrm{T} \beta, Q, \frac{n-p}{2}, \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{2} \right) \; . |
| 140 | +$$ |
| 141 | + |
| 142 | +The [marginal distribution of a normal-gamma distribution with respect to the normal random variable, is a multivariate t-distribution](/P/ng-marg): |
| 143 | + |
| 144 | +$$ \label{eq:ng-mvt} |
| 145 | +X, Y \sim \mathrm{NG}(\mu, \Lambda, a, b) \quad \Rightarrow \quad X \sim \mathrm{t}\left( \mu, \left( \frac{a}{b} \Lambda\right)^{-1}, 2a \right) \; . |
| 146 | +$$ |
| 147 | + |
| 148 | +Thus, the [marginal distribution](/D/dist-marg) of $\hat{\gamma}$ is: |
| 149 | + |
| 150 | +$$ \label{eq:g-est-dist-marg} |
| 151 | +\hat{\gamma} \sim \mathrm{t}\left( C^\mathrm{T} \beta, \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right)^{-1}, n-p \right) \; . |
| 152 | +$$ |
| 153 | + |
| 154 | +4) Because of the following [relationship between the multivariate t-distribution and the F-distribution](/P/mvt-f) |
| 155 | + |
| 156 | +$$ \label{eq:mvt-f} |
| 157 | +X \sim t(\mu, \Sigma, \nu) \quad \Rightarrow \quad (X-\mu)^\mathrm{T} \, \Sigma^{-1} (X-\mu)/n \sim F(n, \nu) \; , |
| 158 | +$$ |
| 159 | + |
| 160 | +the following quantity [is, by definition, F-distributed](/D/f) |
| 161 | + |
| 162 | +$$ \label{eq:mlr-f-s1} |
| 163 | +F = \left( \hat{\gamma} - C^\mathrm{T} \hat{\beta} \right)^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \left( \hat{\gamma} - C^\mathrm{T} \hat{\beta} \right) / q |
| 164 | +$$ |
| 165 | + |
| 166 | +and under the [null hypothesis](/D/h0) \eqref{eq:mlr-f-h0}, it can be evaluated as: |
| 167 | + |
| 168 | +$$ \label{eq:mlr-t-s2} |
| 169 | +\begin{split} |
| 170 | +F &\overset{\eqref{eq:mlr-f-s1}}{=} \left( \hat{\gamma} - C^\mathrm{T} \hat{\beta} \right)^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \left( \hat{\gamma} - C^\mathrm{T} \hat{\beta} \right) / q \\ |
| 171 | +&\overset{\eqref{eq:mlr-f-h0}}{=} \hat{\gamma}^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \hat{\gamma} / q \\ |
| 172 | +&\overset{\eqref{eq:fcon}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) C^\mathrm{T} \hat{\beta} / q \\ |
| 173 | +&\overset{\eqref{eq:g-est-tau-dist-cond}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\ |
| 174 | +&\overset{\eqref{eq:g-est-tau-dist-cond}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{(y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta})} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\ |
| 175 | +&\overset{\eqref{eq:mlr-est}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{1}{\hat{\sigma}^2} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\ |
| 176 | +&= \hat{\beta}^\mathrm{T} C \left( \hat{\sigma}^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} C^\mathrm{T} \hat{\beta} / q \; . |
| 177 | +\end{split} |
| 178 | +$$ |
0 commit comments