|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2020-09-03 08:37:00 |
| 9 | + |
| 10 | +title: "Posterior distribution for multivariate Bayesian linear regression" |
| 11 | +chapter: "Statistical Models" |
| 12 | +section: "Multivariate normal data" |
| 13 | +topic: "Multivariate Bayesian linear regression" |
| 14 | +theorem: "Posterior distribution" |
| 15 | + |
| 16 | +sources: |
| 17 | + - authors: "Wikipedia" |
| 18 | + year: 2020 |
| 19 | + title: "Bayesian multivariate linear regression" |
| 20 | + in: "Wikipedia, the free encyclopedia" |
| 21 | + pages: "retrieved on 2020-09-03" |
| 22 | + url: "https://en.wikipedia.org/wiki/Bayesian_multivariate_linear_regression#Posterior_distribution" |
| 23 | + |
| 24 | +proof_id: "P160" |
| 25 | +shortcut: "mblr-post" |
| 26 | +username: "JoramSoch" |
| 27 | +--- |
| 28 | + |
| 29 | + |
| 30 | +**Theorem:** Let |
| 31 | + |
| 32 | +$$ \label{eq:GLM} |
| 33 | +Y = X B + E, \; E \sim \mathcal{MN}(0, V, \Sigma) |
| 34 | +$$ |
| 35 | + |
| 36 | +be a [general linear model](/D/glm) with measured $n \times v$ data matrix $Y$, known $n \times p$ design matrix $X$, known $n \times n$ [covariance structure](/D/matn) $V$ as well as unknown $p \times v$ regression coefficients $B$ and unknown $v \times v$ [noise covariance](/D/matn) $\Sigma$. Moreover, assume a [normal-Wishart prior distribution](/P/mblr-prior) over the model parameters $B$ and $T = \Sigma^{-1}$: |
| 37 | + |
| 38 | +$$ \label{eq:GLM-NW-prior} |
| 39 | +p(B,T) = \mathcal{MN}(B; M_0, \Lambda_0^{-1}, T^{-1}) \cdot \mathcal{W}(T; P_0^{-1}, \nu_0) \; . |
| 40 | +$$ |
| 41 | + |
| 42 | +Then, the [posterior distribution](/D/post) is also a [normal-Wishart distribution](/D/nw) |
| 43 | + |
| 44 | +$$ \label{eq:GLM-NW-post} |
| 45 | +p(B,T|Y) = \mathcal{MN}(B; M_n, \Lambda_n^{-1}, T^{-1}) \cdot \mathcal{W}(T; P_n^{-1}, \nu_n) |
| 46 | +$$ |
| 47 | + |
| 48 | +and the [posterior hyperparameters](/D/post) are given by |
| 49 | + |
| 50 | +$$ \label{eq:GLM-NW-post-par} |
| 51 | +\begin{split} |
| 52 | +M_n &= \Lambda_n^{-1} (X^\mathrm{T} P Y + \Lambda_0 M_0) \\ |
| 53 | +\Lambda_n &= X^\mathrm{T} P X + \Lambda_0 \\ |
| 54 | +P_n &= P_0 + Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n \\ |
| 55 | +\nu_n &= \nu_0 + n \; . |
| 56 | +\end{split} |
| 57 | +$$ |
| 58 | + |
| 59 | + |
| 60 | +**Proof:** According to [Bayes' theorem](/P/bayes-th), the [posterior distribution](/D/post) is given by |
| 61 | + |
| 62 | +$$ \label{eq:GLM-NG-BT} |
| 63 | +p(B,T|Y) = \frac{p(Y|B,T) \, p(B,T)}{p(Y)} \; . |
| 64 | +$$ |
| 65 | + |
| 66 | +Since $p(Y)$ is just a normalization factor, the [posterior is proportional](/P/post-jl) to the numerator: |
| 67 | + |
| 68 | +$$ \label{eq:GLM-NG-post-JL} |
| 69 | +p(B,T|Y) \propto p(Y|B,T) \, p(B,T) = p(Y,B,T) \; . |
| 70 | +$$ |
| 71 | + |
| 72 | +Equation \eqref{eq:GLM} implies the following [likelihood function](/D/lf) |
| 73 | + |
| 74 | +$$ \label{eq:GLM-LF-Class} |
| 75 | +p(Y|B,\Sigma) = \mathcal{MN}(Y; X B, V, \Sigma) = \sqrt{\frac{1}{(2 \pi)^{nv} |\Sigma|^n |V|^v}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( \Sigma^{-1} (Y-XB)^\mathrm{T} V^{-1} (Y-XB) \right) \right] |
| 76 | +$$ |
| 77 | + |
| 78 | +which, for mathematical convenience, can also be parametrized as |
| 79 | + |
| 80 | +$$ \label{eq:GLM-LF-Bayes} |
| 81 | +p(Y|B,T) = \mathcal{MN}(Y; X B, P, T^{-1}) = \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T (Y-XB)^\mathrm{T} P (Y-XB) \right) \right] |
| 82 | +$$ |
| 83 | + |
| 84 | +using the $v \times v$ [precision matrix](/D/precmat) $T = \Sigma^{-1}$ and the $n \times n$ [precision matrix](/D/precmat) $P = V^{-1}$. |
| 85 | + |
| 86 | +<br> |
| 87 | +Combining the [likelihood function](/D/lf) \eqref{eq:GLM-LF-Bayes} with the [prior distribution](/D/prior) \eqref{eq:GLM-NW-prior}, the [joint likelihood](/D/jl) of the model is given by |
| 88 | + |
| 89 | +$$ \label{eq:GLM-NW-JL-s1} |
| 90 | +\begin{split} |
| 91 | +p(Y,B,T) = \; & p(Y|B,T) \, p(B,T) \\ |
| 92 | += \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T (Y-XB)^\mathrm{T} P (Y-XB) \right) \right] \cdot \\ |
| 93 | +& \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T (B-M_0)^\mathrm{T} \Lambda_0 (B-M_0) \right) \right] \cdot \\ |
| 94 | +& \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \; . |
| 95 | +\end{split} |
| 96 | +$$ |
| 97 | + |
| 98 | +Collecting identical variables gives: |
| 99 | + |
| 100 | +$$ \label{eq:GLM-NW-JL-s2} |
| 101 | +\begin{split} |
| 102 | +p(Y,B,T) = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \cdot \\ |
| 103 | +& \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ (Y-XB)^\mathrm{T} P (Y-XB) + (B-M_0)^\mathrm{T} \Lambda_0 (B-M_0) \right] \right) \right] \; . |
| 104 | +\end{split} |
| 105 | +$$ |
| 106 | + |
| 107 | +Expanding the products in the exponent gives: |
| 108 | + |
| 109 | +$$ \label{eq:GLM-NW-JL-s3} |
| 110 | +\begin{split} |
| 111 | +p(Y,B,T) = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \cdot \\ |
| 112 | +& \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ Y^\mathrm{T} P Y - Y^\mathrm{T} P X B - B^\mathrm{T} X^\mathrm{T} P Y + B^\mathrm{T} X^\mathrm{T} P X B + \right. \right. \right. \\ |
| 113 | +& \hphantom{\exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ \right. \right. \right. \!\!\!} \; \left. \left. \left. B^\mathrm{T} \Lambda_0 B - B^\mathrm{T} \Lambda_0 M_0 - M_0^\mathrm{T} \Lambda_0 B + M_0^\mathrm{T} \Lambda_0 \mu_0 \right] \right) \right] \; . |
| 114 | +\end{split} |
| 115 | +$$ |
| 116 | + |
| 117 | +Completing the square over $B$, we finally have |
| 118 | + |
| 119 | +$$ \label{eq:GLM-NW-JL-s4} |
| 120 | +\begin{split} |
| 121 | +p(Y,B,T) = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \cdot \\ |
| 122 | +& \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ (B-M_n)^\mathrm{T} \Lambda_n (B-M_n) + (Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n) \right] \right) \right] \; . |
| 123 | +\end{split} |
| 124 | +$$ |
| 125 | + |
| 126 | +with the [posterior hyperparameters](/D/post) |
| 127 | + |
| 128 | +$$ \label{eq:GLM-NW-post-B-par} |
| 129 | +\begin{split} |
| 130 | +M_n &= \Lambda_n^{-1} (X^\mathrm{T} P Y + \Lambda_0 M_0) \\ |
| 131 | +\Lambda_n &= X^\mathrm{T} P X + \Lambda_0 \; . |
| 132 | +\end{split} |
| 133 | +$$ |
| 134 | + |
| 135 | +Ergo, the joint likelihood is proportional to |
| 136 | + |
| 137 | +$$ \label{eq:GLM-NW-JL-s5} |
| 138 | +p(Y,B,T) \propto |T|^{p/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ (B-M_n)^\mathrm{T} \Lambda_n (B-M_n) \right] \right) \right] \cdot |T|^{(\nu_n-v-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_n T \right) \right] |
| 139 | +$$ |
| 140 | + |
| 141 | +with the [posterior hyperparameters](/D/post) |
| 142 | + |
| 143 | +$$ \label{eq:GLM-NW-post-T-par} |
| 144 | +\begin{split} |
| 145 | +P_n &= P_0 + Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n \\ |
| 146 | +\nu_n &= \nu_0 + n \; . |
| 147 | +\end{split} |
| 148 | +$$ |
| 149 | + |
| 150 | +From the term in \eqref{eq:GLM-NW-JL-s5}, we can isolate the posterior distribution over $B$ given $T$: |
| 151 | + |
| 152 | +$$ \label{eq:GLM-NW-post-B} |
| 153 | +p(B|T,Y) = \mathcal{MN}(B; M_n, \Lambda_n^{-1}, T^{-1}) \; . |
| 154 | +$$ |
| 155 | + |
| 156 | +From the remaining term, we can isolate the posterior distribution over $T$: |
| 157 | + |
| 158 | +$$ \label{eq:GLM-NW-post-T} |
| 159 | +p(T|Y) = \mathcal{W}(T; P_n^{-1}, \nu_n) \; . |
| 160 | +$$ |
| 161 | + |
| 162 | +Together, \eqref{eq:GLM-NW-post-B} and \eqref{eq:GLM-NW-post-T} constitute the [joint](/D/prob-joint) [posterior distribution](/D/post) of $B$ and $T$. |
0 commit comments