|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2021-10-21 16:46:00 |
| 9 | + |
| 10 | +title: "Best linear unbiased estimator for the inverse general linear model" |
| 11 | +chapter: "Statistical Models" |
| 12 | +section: "Multivariate normal data" |
| 13 | +topic: "Inverse general linear model" |
| 14 | +theorem: "Best linear unbiased estimator" |
| 15 | + |
| 16 | +sources: |
| 17 | + - authors: "Soch J, Allefeld C, Haynes JD" |
| 18 | + year: 2020 |
| 19 | + title: "Inverse transformed encoding models – a solution to the problem of correlated trial-by-trial parameter estimates in fMRI decoding" |
| 20 | + in: "NeuroImage" |
| 21 | + pages: "vol. 209, art. 116449, Appendix C, Theorem 5" |
| 22 | + url: "https://www.sciencedirect.com/science/article/pii/S1053811919310407" |
| 23 | + doi: "10.1016/j.neuroimage.2019.116449" |
| 24 | + |
| 25 | +proof_id: "P268" |
| 26 | +shortcut: "iglm-blue" |
| 27 | +username: "JoramSoch" |
| 28 | +--- |
| 29 | + |
| 30 | + |
| 31 | +**Theorem:** Let there be a [general linear model](/D/glm) of $Y \in \mathbb{R}^{n \times v}$ |
| 32 | + |
| 33 | +$$ \label{eq:glm} |
| 34 | +Y = X B + E, \; E \sim \mathcal{MN}(0, V, \Sigma) |
| 35 | +$$ |
| 36 | + |
| 37 | +[implying the inverse general linear model](/P/iglm-dist) of $X \in \mathbb{R}^{n \times p}$ |
| 38 | + |
| 39 | +$$ \label{eq:iglm} |
| 40 | +X = Y W + N, \; N \sim \mathcal{MN}(0, V, \Sigma_x) \; . |
| 41 | +$$ |
| 42 | + |
| 43 | +where |
| 44 | + |
| 45 | +$$ \label{eq:BW-Sx} |
| 46 | +B \, W = I_p \quad \text{and} \quad \Sigma_x = W^\mathrm{T} \Sigma W \; . |
| 47 | +$$ |
| 48 | + |
| 49 | +Then, the [weighted least squares solution](/P/glm-wls) for $W$ is the [best linear unbiased estimator](/D/blue) of $W$. |
| 50 | + |
| 51 | + |
| 52 | +**Proof:** The [linear transformation theorem for the matrix-normal distribution](/P/matn-ltt) states: |
| 53 | + |
| 54 | +$$ \label{eq:matn-ltt} |
| 55 | +X \sim \mathcal{MN}(M, U, V) \quad \Rightarrow \quad Y = AXB + C \sim \mathcal{MN}(AMB+C, AUA^\mathrm{T}, B^\mathrm{T}VB) \; . |
| 56 | +$$ |
| 57 | + |
| 58 | +The [weighted least squares parameter estimates](/P/glm-wls) for \eqref{eq:iglm} are given by |
| 59 | + |
| 60 | +$$ \label{eq:iglm-wls} |
| 61 | +\hat{W} = (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} X \; . |
| 62 | +$$ |
| 63 | + |
| 64 | +The [best linear unbiased estimator](/D/blue) $\hat{\theta}$ of a certain quantity $\theta$ estimated from [measured data](/D/data) $y$ is 1) an estimator resulting from a linear operation $f(y)$, 2) whose expected value is equal to $\theta$ and 3) which has, among those satisfying 1) and 2), the minimum [variance](/D/var). |
| 65 | + |
| 66 | +<br> |
| 67 | +1) First, $\hat{W}$ is a linear estimator, because it is of the form $\tilde{W} = M \hat{X}$ where $M$ is an arbitrary $v \times n$ matrix. |
| 68 | + |
| 69 | +<br> |
| 70 | +2) Second, $\hat{W}$ is an unbiased estimator, if $\left\langle \hat{W} \right\rangle = W$. By applying \eqref{eq:matn-ltt} to \eqref{eq:iglm}, the distribution of $\tilde{W}$ is |
| 71 | + |
| 72 | +$$ \label{eq:W-hat-dist} |
| 73 | +\tilde{W} = M X \sim \mathcal{MN}(M Y W, M V M^T, \Sigma_x) \; |
| 74 | +$$ |
| 75 | + |
| 76 | +which requires that $M Y = I_v$. This is fulfilled by any matrix $M = (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} + D$ where $D$ is a $v \times n$ matrix which satisfies $D Y = 0$. |
| 77 | + |
| 78 | +<br> |
| 79 | +3) Third, the [best linear unbiased estimator](/D/blue) is the one with minimum [variance](/D/var), i.e. the one that minimizes the expected Frobenius norm |
| 80 | + |
| 81 | +$$ \label{eq:Var-W} |
| 82 | +\mathrm{Var}\left( \tilde{W} \right) = \left\langle \mathrm{tr}\left[ (\tilde{W} - W)^\mathrm{T} (\tilde{W} - W) \right] \right\rangle \; . |
| 83 | +$$ |
| 84 | + |
| 85 | +Using the [matrix-normal distribution](/D/matn) of $\tilde{W}$ from \eqref{eq:W-hat-dist} |
| 86 | + |
| 87 | +$$ \label{eq:W-hat-W-dist} |
| 88 | +\left( \tilde{W} - W \right) \sim \mathcal{MN}(0, M V M^T, \Sigma_x) |
| 89 | +$$ |
| 90 | + |
| 91 | +and the property of the [Wishart distribution](/D/wish) |
| 92 | + |
| 93 | +$$ \label{eq:E-XX} |
| 94 | +X \sim \mathcal{MN}(0, U, V) \quad \Rightarrow \quad \left\langle X X^T \right\rangle = \mathrm{tr}(V) \, U \; , |
| 95 | +$$ |
| 96 | + |
| 97 | +this [variance](/D/var) can be evaluated as a function of $M$: |
| 98 | + |
| 99 | +$$ \label{eq:Var-M} |
| 100 | +\mathrm{Var}\left[ \tilde{W}(M) \right] = \mathrm{tr}(\Sigma_x) \; \mathrm{tr}(M V M^T) \; . |
| 101 | +$$ |
| 102 | + |
| 103 | +As a function of $D$ and using $D Y = 0$, it becomes: |
| 104 | + |
| 105 | +$$ \label{eq:Var-D} |
| 106 | +\begin{split} |
| 107 | +\mathrm{Var}\left[ \tilde{W}(D) \right] &= \mathrm{tr}(\Sigma_x) \; \mathrm{tr}\!\left[ \left( (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} + D \right) V \left( (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} + D \right)^\mathrm{T} \right] \\ |
| 108 | +&= \mathrm{tr}(\Sigma_x) \; \mathrm{tr}\!\left[ (Y^\mathrm{T} V^{-1} Y)^{-1} \, Y^\mathrm{T} V^{-1} V V^{-1} Y \; (Y^\mathrm{T} V^{-1} Y)^{-1} + \right. \\ |
| 109 | +&\hphantom{=\mathrm{tr}(\Sigma_x) \; \mathrm{tr}\!\left[\right.} \left. \, (Y^\mathrm{T} V^{-1} Y)^{-1} Y^\mathrm{T} V^{-1} V D^\mathrm{T} + D V V^{-1} Y (Y^\mathrm{T} V^{-1} Y)^{-1} + D V D^\mathrm{T} \right] \\ |
| 110 | +&= \mathrm{tr}(\Sigma_x) \left[ \mathrm{tr}\!\left( (Y^\mathrm{T} V^{-1} Y)^{-1} \right) + \mathrm{tr}\!\left( D V D^\mathrm{T} \right) \right] \; . |
| 111 | +\end{split} |
| 112 | +$$ |
| 113 | + |
| 114 | +Since $D V D^\mathrm{T}$ is a positive-semidefinite matrix, all its eigenvalues are non-negative. Because the trace of a square matrix is the sum of its eigenvalues, the mimimum variance is achieved by $D = 0$, thus producing $\hat{W}$ as in \eqref{eq:iglm-wls}. |
0 commit comments