Skip to content

Commit 1ccd080

Browse files
authored
added 6 proofs
1 parent 248d72b commit 1ccd080

6 files changed

Lines changed: 762 additions & 0 deletions

File tree

P/mlr-f.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-12-13 12:36:00
9+
10+
title: "F-test for multiple linear regression using contrast-based inference"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Multiple linear regression"
14+
theorem: "Contrast-based F-test"
15+
16+
sources:
17+
- authors: "Stephan, Klaas Enno"
18+
year: 2010
19+
title: "Classical (frequentist) inference"
20+
in: "Methods and models for fMRI data analysis in neuroeconomics"
21+
pages: "Lecture 4, Slides 23/25"
22+
url: "http://www.socialbehavior.uzh.ch/teaching/methodsspring10.html"
23+
- authors: "Koch, Karl-Rudolf"
24+
year: 2007
25+
title: "Multivariate Distributions"
26+
in: "Introduction to Bayesian Statistics"
27+
pages: "Springer, Berlin/Heidelberg, 2007, ch. 2.5, eqs. 2.202, 2.213, 2.211"
28+
url: "https://www.springer.com/de/book/9783540727231"
29+
doi: "10.1007/978-3-540-72726-2"
30+
- authors: "jld"
31+
year: 2018
32+
title: "Understanding t-test for linear regression"
33+
in: "StackExchange CrossValidated"
34+
pages: "retrieved on 2022-12-13"
35+
url: "https://stats.stackexchange.com/a/344008"
36+
- authors: "Penny, William"
37+
year: 2006
38+
title: "Comparing nested GLMs"
39+
in: "Mathematics for Brain Imaging"
40+
pages: "ch. 2.3, pp. 51-52, eq. 2.9"
41+
url: "https://ueapsylabs.co.uk/sites/wpenny/mbi/mbi_course.pdf"
42+
43+
proof_id: "P392"
44+
shortcut: "mlr-f"
45+
username: "JoramSoch"
46+
---
47+
48+
49+
**Theorem:** Consider a [linear regression model](/D/mlr)
50+
51+
$$ \label{eq:mlr}
52+
y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V)
53+
$$
54+
55+
and an [F-contrast](/D/fcon) on the model parameters
56+
57+
$$ \label{eq:fcon}
58+
\gamma = C^\mathrm{T} \beta \quad \text{where} \quad C \in \mathbb{R}^{p \times q} \; .
59+
$$
60+
61+
Then, the [test statistic](/D/tstat)
62+
63+
$$ \label{eq:mlr-f}
64+
F = \hat{\beta}^\mathrm{T} C \left( \hat{\sigma}^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} C^\mathrm{T} \hat{\beta} / q
65+
$$
66+
67+
with the [parameter estimates](/P/mlr-mle)
68+
69+
$$ \label{eq:mlr-est}
70+
\begin{split}
71+
\hat{\beta} &= (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} y \\
72+
\hat{\sigma}^2 &= \frac{1}{n-p} (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta})
73+
\end{split}
74+
$$
75+
76+
follows an [F-distribution](/D/f)
77+
78+
$$ \label{eq:mlr-f-dist}
79+
F \sim \mathrm{F}(q, n-p)
80+
$$
81+
82+
under the [null hypothesis](/D/h0)
83+
84+
$$ \label{eq:mlr-f-h0}
85+
\begin{split}
86+
H_0: &\; \gamma_1 = 0 \wedge \ldots \wedge \gamma_q = 0 \\
87+
H_1: &\; \gamma_1 \neq 0 \vee \ldots \vee \gamma_q \neq 0 \; .
88+
\end{split}
89+
$$
90+
91+
92+
**Proof:**
93+
94+
1) We know that [the estimated regression coefficients in linear regression follow a multivariate normal distribution](/P/mlr-wlsdist):
95+
96+
$$ \label{eq:b-est-dist}
97+
\hat{\beta} \sim \mathcal{N}\left( \beta, \, \sigma^2 (X^\mathrm{T} V^{-1} X)^{-1} \right) \; .
98+
$$
99+
100+
Thus, the [estimated contrast vector](/D/tcon) $\hat{\gamma} = C^\mathrm{T} \hat{\beta}$ is also [distributed according to a multivariate normal distribution](/P/mvn-ltt):
101+
102+
$$ \label{eq:g-est-dist-cond}
103+
\hat{\gamma} \sim \mathcal{N}\left( C^\mathrm{T} \beta, \, \sigma^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right) \; .
104+
$$
105+
106+
Substituting the noise variance $\sigma^2$ with the noise precision $\tau = 1/\sigma^2$, we can also write this down as a [conditional distribution](/D/dist-cond):
107+
108+
$$ \label{eq:g-est-tau-dist-cond}
109+
\hat{\gamma} \vert \tau \sim \mathcal{N}\left( C^\mathrm{T} \beta, (\tau Q)^{-1} \right) \quad \text{with} \quad Q = \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \; .
110+
$$
111+
112+
2) We also know that the [residual sum of squares](/D/rss), divided the [true error variance](/D/mlr)
113+
114+
$$ \label{eq:mlr-rss}
115+
\frac{1}{\sigma^2} \sum_{i=1}^{n} \hat{\varepsilon}_i^2 = \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} = \frac{1}{\sigma^2} (y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta})
116+
$$
117+
118+
[is following a chi-squared distribution](/P/mlr-rssdist):
119+
120+
$$ \label{eq:mlr-rss-dist}
121+
\frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{\sigma^2} = \tau \, \hat{\varepsilon}^\mathrm{T} \hat{\varepsilon} \sim \chi^2(n-p) \; .
122+
$$
123+
124+
The [chi-squared distribution is related to the gamma distribution](/P/gam-chi2) in the following way:
125+
126+
$$ \label{eq:gam-chi2}
127+
X \sim \chi^2(k) \quad \Rightarrow \quad cX \sim \mathrm{Gam}\left( \frac{k}{2}, \frac{1}{2c} \right) \; .
128+
$$
129+
130+
Thus, applying \eqref{eq:gam-chi2} to \eqref{eq:mlr-rss-dist}, we obtain the [marginal distribution](/D/dist-marg) of $\tau$ as:
131+
132+
$$ \label{eq:tau-dist}
133+
\frac{1}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} \left( \tau \, \hat{\varepsilon}^\mathrm{T} \hat{\varepsilon} \right) = \tau \sim \mathrm{Gam}\left( \frac{n-p}{2}, \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{2} \right) \; .
134+
$$
135+
136+
3) Note that the [joint distribution](/D/dist-joint) of $\hat{\gamma}$ and $\tau$ is, following from \eqref{eq:g-est-tau-dist-cond} and \eqref{eq:tau-dist} and [by definition, a normal-gamma distribution](/D/ng):
137+
138+
$$ \label{eq:g-est-tau-dist-joint}
139+
\hat{\gamma}, \tau \sim \mathrm{NG}\left( C^\mathrm{T} \beta, Q, \frac{n-p}{2}, \frac{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}}{2} \right) \; .
140+
$$
141+
142+
The [marginal distribution of a normal-gamma distribution with respect to the normal random variable, is a multivariate t-distribution](/P/ng-marg):
143+
144+
$$ \label{eq:ng-mvt}
145+
X, Y \sim \mathrm{NG}(\mu, \Lambda, a, b) \quad \Rightarrow \quad X \sim \mathrm{t}\left( \mu, \left( \frac{a}{b} \Lambda\right)^{-1}, 2a \right) \; .
146+
$$
147+
148+
Thus, the [marginal distribution](/D/dist-marg) of $\hat{\gamma}$ is:
149+
150+
$$ \label{eq:g-est-dist-marg}
151+
\hat{\gamma} \sim \mathrm{t}\left( C^\mathrm{T} \beta, \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right)^{-1}, n-p \right) \; .
152+
$$
153+
154+
4) Because of the following [relationship between the multivariate t-distribution and the F-distribution](/P/mvt-f)
155+
156+
$$ \label{eq:mvt-f}
157+
X \sim t(\mu, \Sigma, \nu) \quad \Rightarrow \quad (X-\mu)^\mathrm{T} \, \Sigma^{-1} (X-\mu)/n \sim F(n, \nu) \; ,
158+
$$
159+
160+
the following quantity [is, by definition, F-distributed](/D/f)
161+
162+
$$ \label{eq:mlr-f-s1}
163+
F = \left( \hat{\gamma} - C^\mathrm{T} \hat{\beta} \right)^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \left( \hat{\gamma} - C^\mathrm{T} \hat{\beta} \right) / q
164+
$$
165+
166+
and under the [null hypothesis](/D/h0) \eqref{eq:mlr-f-h0}, it can be evaluated as:
167+
168+
$$ \label{eq:mlr-t-s2}
169+
\begin{split}
170+
F &\overset{\eqref{eq:mlr-f-s1}}{=} \left( \hat{\gamma} - C^\mathrm{T} \hat{\beta} \right)^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \left( \hat{\gamma} - C^\mathrm{T} \hat{\beta} \right) / q \\
171+
&\overset{\eqref{eq:mlr-f-h0}}{=} \hat{\gamma}^\mathrm{T} \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) \hat{\gamma} / q \\
172+
&\overset{\eqref{eq:fcon}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} Q \right) C^\mathrm{T} \hat{\beta} / q \\
173+
&\overset{\eqref{eq:g-est-tau-dist-cond}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{\hat{\varepsilon}^\mathrm{T} \hat{\varepsilon}} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\
174+
&\overset{\eqref{eq:g-est-tau-dist-cond}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{n-p}{(y-X\hat{\beta})^\mathrm{T} V^{-1} (y-X\hat{\beta})} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\
175+
&\overset{\eqref{eq:mlr-est}}{=} \hat{\beta}^\mathrm{T} C \left( \frac{1}{\hat{\sigma}^2} \left( C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} \right) C^\mathrm{T} \hat{\beta} / q \\
176+
&= \hat{\beta}^\mathrm{T} C \left( \hat{\sigma}^2 C^\mathrm{T} (X^\mathrm{T} V^{-1} X)^{-1} C \right)^{-1} C^\mathrm{T} \hat{\beta} / q \; .
177+
\end{split}
178+
$$

P/mlr-ind.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-12-13 16:18:00
9+
10+
title: "Independence of estimated parameters and residuals in multiple linear regression"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Multiple linear regression"
14+
theorem: "Independence of estimated parameters and residuals"
15+
16+
sources:
17+
- authors: "jld"
18+
year: 2018
19+
title: "Understanding t-test for linear regression"
20+
in: "StackExchange CrossValidated"
21+
pages: "retrieved on 2022-12-13"
22+
url: "https://stats.stackexchange.com/a/344008"
23+
24+
proof_id: "P393"
25+
shortcut: "mlr-ind"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Theorem:** Assume a [linear regression model](/D/mlr) with correlated observations
31+
32+
$$ \label{eq:mlr}
33+
y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 V)
34+
$$
35+
36+
and consider estimation using [weighted least squares](/P/mlr-wls). Then, the [estimated parameters and the vector of residuals](/P/mlr-wlsdist) are independent from each other:
37+
38+
$$ \label{eq:mlr-ind}
39+
\begin{split}
40+
\hat{\beta} &= (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} y \quad \text{and} \\ \hat{\varepsilon} &= y - X \hat{\beta} \quad \text{ind.}
41+
\end{split}
42+
$$
43+
44+
45+
**Proof:** Equation \eqref{eq:mlr} [implies the following distribution](/P/mlr-wlsdist) for the [random vector](/D/rvec) $y$:
46+
47+
$$ \label{eq:y-dist}
48+
\begin{split}
49+
y &\sim \mathcal{N}\left( X \beta, \sigma^2 V \right) \\
50+
&\sim \mathcal{N}\left( X \beta, \Sigma \right) \\
51+
\text{with} \quad \Sigma &= \sigma^2 V \; .
52+
\end{split}
53+
$$
54+
55+
Note that the [estimated parameters and residuals can be written as projections from the same random vector](/P/mlr-mat) $y$:
56+
57+
$$ \label{eq:b-proj}
58+
\begin{split}
59+
\hat{\beta} &= (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} y \\
60+
&= A y \\
61+
\text{with} \quad A &= (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1}
62+
\end{split}
63+
$$
64+
65+
$$ \label{eq:e-proj}
66+
\begin{split}
67+
\hat{\varepsilon} &= y - X \hat{\beta} \\
68+
&= (I_n - X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1}) y \\
69+
&= B y \\
70+
\text{with} \quad B &= (I_n - X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1}) \; .
71+
\end{split}
72+
$$
73+
74+
Two projections $AZ$ and $BZ$ from the same [multivariate normal](/D/mvn) [random vector](/D/rvec) $Z \sim \mathcal{N}(\mu, \Sigma)$ [are independent, if and only if the following condition holds](/P/mlr-ind):
75+
76+
$$ \label{eq:mvn-ind}
77+
A \Sigma B^\mathrm{T} = 0 \; .
78+
$$
79+
80+
Combining \eqref{eq:y-dist}, \eqref{eq:b-proj} and \eqref{eq:e-proj}, we check whether this is fulfilled in the present case:
81+
82+
$$ \label{eq:mlr-ind-qed}
83+
\begin{split}
84+
A \Sigma B^\mathrm{T} &= (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} (\sigma^2 V) (I_n - X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1})^\mathrm{T} \\
85+
&= \sigma^2 \left[ (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} V - (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} V^{-1} V V^{-1} X (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} \right] \\
86+
&= \sigma^2 \left[ (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} - (X^\mathrm{T} V^{-1} X)^{-1} X^\mathrm{T} \right] \\
87+
&= \sigma^2 \cdot 0_{pn} \\
88+
&= 0 \; .
89+
\end{split}
90+
$$
91+
92+
This demonstrates that $\hat{\beta}$ and $\hat{\varepsilon}$ -- and likewise, all [pairs of terms separately derived](/P/mlr-t) from $\hat{\beta}$ and $\hat{\varepsilon}$ -- are [statistically independent](/D/ind).

0 commit comments

Comments
 (0)