Skip to content

Commit 5f8d821

Browse files
authored
added 6 proofs
1 parent 21fa290 commit 5f8d821

6 files changed

Lines changed: 513 additions & 0 deletions

File tree

P/cfm-exist.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-10-21 17:43:00
9+
10+
title: "Existence of the corresponding forward model"
11+
chapter: "Statistical Models"
12+
section: "Multivariate normal data"
13+
topic: "Inverse general linear model"
14+
theorem: "Proof of existence"
15+
16+
sources:
17+
- authors: "Haufe S, Meinecke F, Görgen K, Dähne S, Haynes JD, Blankertz B, Bießmann F"
18+
year: 2014
19+
title: "On the interpretation of weight vectors of linear models in multivariate neuroimaging"
20+
in: "NeuroImage"
21+
pages: "vol. 87, pp. 96–110, Appendix B"
22+
url: "https://www.sciencedirect.com/science/article/pii/S1053811913010914"
23+
doi: "10.1016/j.neuroimage.2013.10.067"
24+
25+
proof_id: "P270"
26+
shortcut: "cfm-exist"
27+
username: "JoramSoch"
28+
---
29+
30+
31+
**Theorem:** Let there be observations $Y \in \mathbb{R}^{n \times v}$ and $X \in \mathbb{R}^{n \times p}$ and consider a weight matrix $W \in \mathbb{R}^{v \times p}$ predicting $X$ from $Y$:
32+
33+
$$ \label{eq:bda}
34+
\hat{X} = Y W \; .
35+
$$
36+
37+
Then, there exists a [corresponding forward model](/D/cfm).
38+
39+
40+
**Proof:** The [corresponding forward model](/D/cfm) is defined as
41+
42+
$$ \label{eq:cfm}
43+
Y = \hat{X} A^\mathrm{T} + E \quad \text{with} \quad \hat{X}^\mathrm{T} E = 0
44+
$$
45+
46+
and the [parameters of the corresponding forward model](/P/cfm-para) are equal to
47+
48+
$$ \label{eq:cfm-para}
49+
A = \Sigma_y W \Sigma_x^{-1} \quad \text{where} \quad \Sigma_x = \hat{X}^\mathrm{T} \hat{X} \quad \text{and} \quad \Sigma_y = Y^\mathrm{T} Y \; .
50+
$$
51+
52+
<br>
53+
1) Because the columns of $\hat{X}$ are assumed to be linearly independent [by definition of the corresponding forward model](/D/cfm), the matrix $\Sigma_x = \hat{X}^\mathrm{T} \hat{X}$ is invertible, such that $A$ in \eqref{eq:cfm-para} is well-defined.
54+
55+
<br>
56+
2) Moreover, the solution for the matrix $A$ satisfies the [constraint of the corresponding forward model](/D/cfm) for predicted $X$ and errors $E$ to be uncorrelated which can be shown as follows:
57+
58+
$$ \label{eq:X-E-0}
59+
\begin{split}
60+
\hat{X}^\mathrm{T} E &\overset{\eqref{eq:cfm}}{=} \hat{X}^\mathrm{T} \left( Y - \hat{X} A^\mathrm{T} \right) \\
61+
&\overset{\eqref{eq:cfm-para}}{=} \hat{X}^\mathrm{T} \left( Y - \hat{X} \, \Sigma_x^{-1} W^\mathrm{T} \Sigma_y \right) \\
62+
&= \hat{X}^\mathrm{T} Y - \hat{X}^\mathrm{T} \hat{X} \, \Sigma_x^{-1} W^\mathrm{T} \Sigma_y \\
63+
&\overset{\eqref{eq:cfm-para}}{=} \hat{X}^\mathrm{T} Y - \hat{X}^\mathrm{T} \hat{X} \left( \hat{X}^\mathrm{T} \hat{X} \right)^{-1} W^\mathrm{T} \left( Y^\mathrm{T} Y \right) \\
64+
% &= \hat{X}^\mathrm{T} Y - W^\mathrm{T} \left( Y^\mathrm{T} Y \right) \\
65+
&\overset{\eqref{eq:bda}}{=} (Y W)^\mathrm{T} Y - W^\mathrm{T} \left( Y^\mathrm{T} Y \right) \\
66+
&= W^\mathrm{T} Y^\mathrm{T} Y - W^\mathrm{T} Y^\mathrm{T} Y \\
67+
&= 0 \; .
68+
\end{split}
69+
$$
70+
71+
This completes the proof.

P/cfm-para.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-10-21 17:20:00
9+
10+
title: "Parameters of the corresponding forward model"
11+
chapter: "Statistical Models"
12+
section: "Multivariate normal data"
13+
topic: "Inverse general linear model"
14+
theorem: "Derivation of parameters"
15+
16+
sources:
17+
- authors: "Haufe S, Meinecke F, Görgen K, Dähne S, Haynes JD, Blankertz B, Bießmann F"
18+
year: 2014
19+
title: "On the interpretation of weight vectors of linear models in multivariate neuroimaging"
20+
in: "NeuroImage"
21+
pages: "vol. 87, pp. 96–110, Theorem 1"
22+
url: "https://www.sciencedirect.com/science/article/pii/S1053811913010914"
23+
doi: "10.1016/j.neuroimage.2013.10.067"
24+
25+
proof_id: "P269"
26+
shortcut: "cfm-para"
27+
username: "JoramSoch"
28+
---
29+
30+
31+
**Theorem:** Let there be observations $Y \in \mathbb{R}^{n \times v}$ and $X \in \mathbb{R}^{n \times p}$ and consider a weight matrix $W \in \mathbb{R}^{v \times p}$ predicting $X$ from $Y$:
32+
33+
$$ \label{eq:bda}
34+
\hat{X} = Y W \; .
35+
$$
36+
37+
Then, the parameter matrix of the [corresponding forward model](/D/cfm) is equal to
38+
39+
$$ \label{eq:cfm-para}
40+
A = \Sigma_y W \Sigma_x^{-1}
41+
$$
42+
43+
with the [sample covariance](/D/cov-samp)
44+
45+
$$ \label{eq:Sx-Sy}
46+
\begin{split}
47+
\Sigma_x &= \hat{X}^\mathrm{T} \hat{X} \\
48+
\Sigma_y &= Y^\mathrm{T} Y \; .
49+
\end{split}
50+
$$
51+
52+
53+
**Proof:** The [corresponding forward model](/D/cfm) is given by
54+
55+
$$ \label{eq:cfm}
56+
Y = \hat{X} A^\mathrm{T} + E \; ,
57+
$$
58+
59+
subject to the constraint that predicted $X$ and errors $E$ are uncorrelated:
60+
61+
$$ \label{eq:cfm-con}
62+
\hat{X}^\mathrm{T} E = 0 \; .
63+
$$
64+
65+
With that, we can directly derive the parameter matrix $A$:
66+
67+
$$ \label{eq:cfm-para-qed}
68+
\begin{split}
69+
Y &\overset{\eqref{eq:cfm}}{=} \hat{X} A^\mathrm{T} + E \\
70+
\hat{X} A^\mathrm{T} &= Y - E \\
71+
\hat{X}^\mathrm{T} \hat{X} A^\mathrm{T} &= \hat{X}^\mathrm{T} (Y - E) \\
72+
\hat{X}^\mathrm{T} \hat{X} A^\mathrm{T} &= \hat{X}^\mathrm{T} Y - \hat{X}^\mathrm{T} E \\
73+
\hat{X}^\mathrm{T} \hat{X} A^\mathrm{T} &\overset{\eqref{eq:cfm-con}}{=} \hat{X}^\mathrm{T} Y \\
74+
\hat{X}^\mathrm{T} \hat{X} A^\mathrm{T} &\overset{\eqref{eq:bda}}{=} W^\mathrm{T} Y^\mathrm{T} Y \\
75+
\Sigma_x A^\mathrm{T} &\overset{\eqref{eq:Sx-Sy}}{=} W^\mathrm{T} \Sigma_y \\
76+
A^\mathrm{T} &= \Sigma_x^{-1} W^\mathrm{T} \Sigma_y \\
77+
A &= \Sigma_y W \Sigma_x^{-1} \; .
78+
\end{split}
79+
$$

P/iglm-blue.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-10-21 16:46:00
9+
10+
title: "Best linear unbiased estimator for the inverse general linear model"
11+
chapter: "Statistical Models"
12+
section: "Multivariate normal data"
13+
topic: "Inverse general linear model"
14+
theorem: "Best linear unbiased estimator"
15+
16+
sources:
17+
- authors: "Soch J, Allefeld C, Haynes JD"
18+
year: 2020
19+
title: "Inverse transformed encoding models – a solution to the problem of correlated trial-by-trial parameter estimates in fMRI decoding"
20+
in: "NeuroImage"
21+
pages: "vol. 209, art. 116449, Appendix C, Theorem 5"
22+
url: "https://www.sciencedirect.com/science/article/pii/S1053811919310407"
23+
doi: "10.1016/j.neuroimage.2019.116449"
24+
25+
proof_id: "P268"
26+
shortcut: "iglm-blue"
27+
username: "JoramSoch"
28+
---
29+
30+
31+
**Theorem:** Let there be a [general linear model](/D/glm) of $Y \in \mathbb{R}^{n \times v}$
32+
33+
$$ \label{eq:glm}
34+
Y = X B + E, \; E \sim \mathcal{MN}(0, V, \Sigma)
35+
$$
36+
37+
[implying the inverse general linear model](/P/iglm-dist) of $X \in \mathbb{R}^{n \times p}$
38+
39+
$$ \label{eq:iglm}
40+
X = Y W + N, \; N \sim \mathcal{MN}(0, V, \Sigma_x) \; .
41+
$$
42+
43+
where
44+
45+
$$ \label{eq:BW-Sx}
46+
B \, W = I_p \quad \text{and} \quad \Sigma_x = W^\mathrm{T} \Sigma W \; .
47+
$$
48+
49+
Then, the [weighted least squares solution](/P/glm-wls) for $W$ is the [best linear unbiased estimator](/D/blue) of $W$.
50+
51+
52+
**Proof:** The [linear transformation theorem for the matrix-normal distribution](/P/matn-ltt) states:
53+
54+
$$ \label{eq:matn-ltt}
55+
X \sim \mathcal{MN}(M, U, V) \quad \Rightarrow \quad Y = AXB + C \sim \mathcal{MN}(AMB+C, AUA^\mathrm{T}, B^\mathrm{T}VB) \; .
56+
$$
57+
58+
The [weighted least squares parameter estimates](/P/glm-wls) for \eqref{eq:iglm} are given by
59+
60+
$$ \label{eq:iglm-wls}
61+
\hat{W} = (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} X \; .
62+
$$
63+
64+
The [best linear unbiased estimator](/D/blue) $\hat{\theta}$ of a certain quantity $\theta$ estimated from [measured data](/D/data) $y$ is 1) an estimator resulting from a linear operation $f(y)$, 2) whose expected value is equal to $\theta$ and 3) which has, among those satisfying 1) and 2), the minimum [variance](/D/var).
65+
66+
<br>
67+
1) First, $\hat{W}$ is a linear estimator, because it is of the form $\tilde{W} = M \hat{X}$ where $M$ is an arbitrary $v \times n$ matrix.
68+
69+
<br>
70+
2) Second, $\hat{W}$ is an unbiased estimator, if $\left\langle \hat{W} \right\rangle = W$. By applying \eqref{eq:matn-ltt} to \eqref{eq:iglm}, the distribution of $\tilde{W}$ is
71+
72+
$$ \label{eq:W-hat-dist}
73+
\tilde{W} = M X \sim \mathcal{MN}(M Y W, M V M^T, \Sigma_x) \;
74+
$$
75+
76+
which requires that $M Y = I_v$. This is fulfilled by any matrix $M = (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} + D$ where $D$ is a $v \times n$ matrix which satisfies $D Y = 0$.
77+
78+
<br>
79+
3) Third, the [best linear unbiased estimator](/D/blue) is the one with minimum [variance](/D/var), i.e. the one that minimizes the expected Frobenius norm
80+
81+
$$ \label{eq:Var-W}
82+
\mathrm{Var}\left( \tilde{W} \right) = \left\langle \mathrm{tr}\left[ (\tilde{W} - W)^\mathrm{T} (\tilde{W} - W) \right] \right\rangle \; .
83+
$$
84+
85+
Using the [matrix-normal distribution](/D/matn) of $\tilde{W}$ from \eqref{eq:W-hat-dist}
86+
87+
$$ \label{eq:W-hat-W-dist}
88+
\left( \tilde{W} - W \right) \sim \mathcal{MN}(0, M V M^T, \Sigma_x)
89+
$$
90+
91+
and the property of the [Wishart distribution](/D/wish)
92+
93+
$$ \label{eq:E-XX}
94+
X \sim \mathcal{MN}(0, U, V) \quad \Rightarrow \quad \left\langle X X^T \right\rangle = \mathrm{tr}(V) \, U \; ,
95+
$$
96+
97+
this [variance](/D/var) can be evaluated as a function of $M$:
98+
99+
$$ \label{eq:Var-M}
100+
\mathrm{Var}\left[ \tilde{W}(M) \right] = \mathrm{tr}(\Sigma_x) \; \mathrm{tr}(M V M^T) \; .
101+
$$
102+
103+
As a function of $D$ and using $D Y = 0$, it becomes:
104+
105+
$$ \label{eq:Var-D}
106+
\begin{split}
107+
\mathrm{Var}\left[ \tilde{W}(D) \right] &= \mathrm{tr}(\Sigma_x) \; \mathrm{tr}\!\left[ \left( (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} + D \right) V \left( (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} + D \right)^\mathrm{T} \right] \\
108+
&= \mathrm{tr}(\Sigma_x) \; \mathrm{tr}\!\left[ (Y^\mathrm{T} V^{-1} Y)^{-1} \, Y^\mathrm{T} V^{-1} V V^{-1} Y \; (Y^\mathrm{T} V^{-1} Y)^{-1} + \right. \\
109+
&\hphantom{=\mathrm{tr}(\Sigma_x) \; \mathrm{tr}\!\left[\right.} \left. \, (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} V D^\mathrm{T} + D V V^{-1} Y (Y^\mathrm{T} V^{-1} Y)^{-1} + D V D^\mathrm{T} \right] \\
110+
&= \mathrm{tr}(\Sigma_x) \left[ \mathrm{tr}\!\left( (Y^\mathrm{T} V^{-1} Y)^{-1} \right) + \mathrm{tr}\!\left( D V D^\mathrm{T} \right) \right] \; .
111+
\end{split}
112+
$$
113+
114+
Since $D V D^\mathrm{T}$ is a positive-semidefinite matrix, all its eigenvalues are non-negative. Because the trace of a square matrix is the sum of its eigenvalues, the mimimum variance is achieved by $D = 0$, thus producing $\hat{W}$ as in \eqref{eq:iglm-wls}.

P/iglm-dist.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-10-21 16:03:00
9+
10+
title: "Distribution of the inverse general linear model"
11+
chapter: "Statistical Models"
12+
section: "Multivariate normal data"
13+
topic: "Inverse general linear model"
14+
theorem: "Derivation of the distribution"
15+
16+
sources:
17+
- authors: "Soch J, Allefeld C, Haynes JD"
18+
year: 2020
19+
title: "Inverse transformed encoding models – a solution to the problem of correlated trial-by-trial parameter estimates in fMRI decoding"
20+
in: "NeuroImage"
21+
pages: "vol. 209, art. 116449, Appendix C, Theorem 4"
22+
url: "https://www.sciencedirect.com/science/article/pii/S1053811919310407"
23+
doi: "10.1016/j.neuroimage.2019.116449"
24+
25+
proof_id: "P267"
26+
shortcut: "iglm-dist"
27+
username: "JoramSoch"
28+
---
29+
30+
31+
**Theorem:** Let there be a [general linear model](/D/glm) of $Y \in \mathbb{R}^{n \times v}$
32+
33+
$$ \label{eq:glm}
34+
Y = X B + E, \; E \sim \mathcal{MN}(0, V, \Sigma) \; .
35+
$$
36+
37+
Then, the [inverse general linear model](/D/iglm) of $X \in \mathbb{R}^{n \times p}$ is given by
38+
39+
$$ \label{eq:iglm}
40+
X = Y W + N, \; N \sim \mathcal{MN}(0, V, \Sigma_x)
41+
$$
42+
43+
where $W \in \mathbb{R}^{v \times p}$ is a matrix, such that $B \, W = I_p$, and the covariance across columns is $\Sigma_x = W^\mathrm{T} \Sigma W$.
44+
45+
46+
**Proof:** The [linear transformation theorem for the matrix-normal distribution](/P/matn-ltt) states:
47+
48+
$$ \label{eq:matn-ltt}
49+
X \sim \mathcal{MN}(M, U, V) \quad \Rightarrow \quad Y = AXB + C \sim \mathcal{MN}(AMB+C, AUA^\mathrm{T}, B^\mathrm{T}VB) \; .
50+
$$
51+
52+
The matrix $W$ exists, if the rows of $B \in \mathbb{R}^{p \times v}$ are linearly independent, such that $\mathrm{rk}(B) = p$. Then, right-multiplying the model \eqref{eq:glm} and applying \eqref{eq:matn-ltt} with $W$ yields
53+
54+
$$ \label{eq:iglm-s1}
55+
Y W = X B W + E W, \; E W \sim \mathcal{MN}(0, V, W^\mathrm{T} \Sigma W) \; .
56+
$$
57+
58+
Applying $B \, W = I_p$ and rearranging, we have
59+
60+
$$ \label{eq:iglm-s2}
61+
X = Y W - E W, \; E W \sim \mathcal{MN}(0, V, W^\mathrm{T} \Sigma W) \; .
62+
$$
63+
64+
Substituting $N = - E W$, we get
65+
66+
$$ \label{eq:iglm-s3}
67+
X = Y W + N, \; N \sim \mathcal{MN}(0, V, W^\mathrm{T} \Sigma W)
68+
$$
69+
70+
which is equivalent to \eqref{eq:iglm}.

0 commit comments

Comments
 (0)