added 4 proofs

JoramSoch · web-flow · commit 10b2e8afb091 · 2020-09-03T09:22:50.000+02:00
diff --git a/P/cov-ind.md b/P/cov-ind.md
@@ -0,0 +1,57 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-09-03 06:05:00
+
+title: "Covariance of independent random variables"
+chapter: "General Theorems"
+section: "Probability theory"
+topic: "Covariance"
+theorem: "Covariance under independence"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2020
+    title: "Covariance"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2020-09-03"
+    url: "https://en.wikipedia.org/wiki/Covariance#Uncorrelatedness_and_independence"
+
+proof_id: "P158"
+shortcut: "cov-ind"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X$ and $Y$ be [independent](/D/ind) [random variables](/D/rvar). Then, the [covariance](/D/cov) of $X$ and $Y$ is zero:
+
+$$ \label{eq:cov-ind}
+X, Y \; \text{independent} \quad \Rightarrow \quad \mathrm{Cov}(X,Y) = 0 \; .
+$$
+
+
+**Proof:** The [covariance can be expressed in terms of expected values](/P/cov-mean) as
+
+$$ \label{eq:cov-mean}
+\mathrm{Cov}(X,Y) = \mathrm{E}(X\,Y) - \mathrm{E}(X) \, \mathrm{E}(Y) \; .
+$$
+
+For independent random variables, [the expected value of the product is equal to the product of the expected values](/P/mean-mult):
+
+$$ \label{eq:mean-mult}
+\mathrm{E}(X\,Y) = \mathrm{E}(X) \, \mathrm{E}(Y) \; .
+$$
+
+Taking \eqref{eq:cov-mean} and \eqref{eq:mean-mult} together, we have
+
+$$ \label{eq:cov-ind-qed}
+\begin{split}
+\mathrm{Cov}(X,Y) &\overset{\eqref{eq:cov-mean}}{=} \mathrm{E}(X\,Y) - \mathrm{E}(X) \, \mathrm{E}(Y) \\
+&\overset{\eqref{eq:mean-mult}}{=} \mathrm{E}(X) \, \mathrm{E}(Y) - \mathrm{E}(X) \, \mathrm{E}(Y) \\
+&= 0 \; .
+\end{split}
+$$
diff --git a/P/mblr-lme.md b/P/mblr-lme.md
@@ -0,0 +1,130 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-09-03 09:23:00
+
+title: "Log model evidence for multivariate Bayesian linear regression"
+chapter: "Statistical Models"
+section: "Multivariate normal data"
+topic: "Multivariate Bayesian linear regression"
+theorem: "Log model evidence"
+
+sources:
+
+proof_id: "P161"
+shortcut: "mblr-lme"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let
+
+$$ \label{eq:GLM}
+Y = X B + E, \; E \sim \mathcal{MN}(0, V, \Sigma)
+$$
+
+be a [general linear model](/D/glm) with measured $n \times v$ data matrix $Y$, known $n \times p$ design matrix $X$, known $n \times n$ [covariance structure](/D/matn) $V$ as well as unknown $p \times v$ regression coefficients $B$ and unknown $v \times v$ [noise covariance](/D/matn) $\Sigma$. Moreover, assume a [normal-Wishart prior distribution](/P/mblr-prior) over the model parameters $B$ and $T = \Sigma^{-1}$:
+
+$$ \label{eq:GLM-NW-prior}
+p(B,T) = \mathcal{MN}(B; M_0, \Lambda_0^{-1}, T^{-1}) \cdot \mathcal{W}(T; P_0^{-1}, \nu_0) \; .
+$$
+
+Then, the [log model evidence](/D/lme) for this model is
+
+\begin{equation} \label{eq:GLM-NW-LME}
+\begin{split}
+\log p(y|m) = & \frac{v}{2} \log |P| - \frac{nv}{2} \log (2 \pi)  + \frac{v}{2} \log |\Lambda_0| - \frac{v}{2} \log |\Lambda_n| + \\
+& \frac{\nu_0}{2} \log\left| \frac{1}{2} P_0 \right| - \frac{\nu_n}{2} \log\left| \frac{1}{2} P_n \right| + \log \Gamma_v \left( \frac{\nu_n}{2} \right) - \log \Gamma_v \left( \frac{\nu_0}{2} \right)
+\end{split}
+\end{equation}
+
+where the [posterior hyperparameters](/D/post) are given by
+
+\begin{equation} \label{eq:GLM-NW-post-par}
+\begin{split}
+M_n &= \Lambda_n^{-1} (X^\mathrm{T} P Y + \Lambda_0 M_0) \\
+\Lambda_n &= X^\mathrm{T} P X + \Lambda_0 \\
+P_n &= P_0 + Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n \\
+\nu_n &= \nu_0 + n \; .
+\end{split}
+\end{equation}
+
+
+**Proof:** According to the [law of marginal probability](/D/prob-marg), the [model evidence](/D/ml) for this model is:
+
+$$ \label{eq:GLM-NW-ME-s1}
+p(Y|m) = \iint p(Y|B,T) \, p(B,T) \, \mathrm{d}B \, \mathrm{d}T \; .
+$$
+
+According to the [law of conditional probability](/D/prob-cond), the integrand is equivalent to the [joint likelihood](/D/jl):
+
+$$ \label{eq:GLM-NW-ME-s2}
+p(Y|m) = \iint p(Y,B,T) \, \mathrm{d}B \, \mathrm{d}T \; .
+$$
+
+Equation \eqref{eq:GLM} implies the following [likelihood function](/D/lf)
+
+$$ \label{eq:GLM-LF-Class}
+p(Y|B,\Sigma) = \mathcal{MN}(Y; X B, V, \Sigma) = \sqrt{\frac{1}{(2 \pi)^{nv} |\Sigma|^n |V|^v}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( \Sigma^{-1} (Y-XB)^\mathrm{T} V^{-1} (Y-XB) \right) \right]
+$$
+
+which, for mathematical convenience, can also be parametrized as
+
+$$ \label{eq:GLM-LF-Bayes}
+p(Y|B,T) = \mathcal{MN}(Y; X B, P, T^{-1}) = \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T (Y-XB)^\mathrm{T} P (Y-XB) \right) \right]
+$$
+
+using the $v \times v$ [precision matrix](/D/precmat) $T = \Sigma^{-1}$ and the $n \times n$ [precision matrix](/D/precmat) $P = V^{-1}$.
+
+<br>
+When [deriving the posterior distribution](/P/mblr-post) $p(B,T|Y)$, the joint likelihood $p(Y,B,T)$ is obtained as
+
+\begin{equation} \label{eq:GLM-NW-LME-s1}
+\begin{split}
+p(Y,B,T) = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \cdot \\
+& \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ (B-M_n)^\mathrm{T} \Lambda_n (B-M_n) + (Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n) \right] \right) \right] \; .
+\end{split}
+\end{equation}
+
+Using the [probability density function of the matrix-normal distribution](/P/matn-pdf), we can rewrite this as
+
+\begin{equation} \label{eq:GLM-NW-LME-s2}
+\begin{split}
+p(Y,B,T) = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \sqrt{\frac{(2 \pi)^{pv}}{|T|^p |\Lambda_n|^v}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \cdot \\
+& \mathcal{MN}(B; M_n, \Lambda_n^{-1}, T^{-1}) \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n \right] \right) \right] \; .
+\end{split}
+\end{equation}
+
+Now, $B$ can be integrated out easily:
+
+\begin{equation} \label{eq:GLM-NW-LME-s3}
+\begin{split}
+\int p(Y,B,T) \, \mathrm{d}B = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|\Lambda_0|^v}{|\Lambda_n|^v}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \cdot \\
+& \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ P_0 + Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n \right] \right) \right] \; .
+\end{split}
+\end{equation}
+
+Using the [probability density function of the Wishart distribution](/P/wish-pdf), we can rewrite this as
+
+$$ \label{eq:GLM-NW-LME-s4}
+\int p(Y,B,T) \, \mathrm{d}B = \sqrt{\frac{|P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|\Lambda_0|^v}{|\Lambda_n|^v}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \sqrt{\frac{2^{\nu_n v}}{|P_n|^{\nu_n}}} \, \frac{\Gamma_v \left( \frac{\nu_n}{2} \right)}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot \mathcal{W}(T; P_n^{-1}, \nu_n) \; .
+$$
+
+Finally, $T$ can also be integrated out:
+
+$$ \label{eq:GLM-NW-LME-s5}
+\iint p(Y,B,T) \, \mathrm{d}B \, \mathrm{d}T = \sqrt{\frac{|P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|\Lambda_0|^v}{|\Lambda_n|^v}} \sqrt{\frac{\left| \frac{1}{2} P_0 \right|^{\nu_0}}{\left| \frac{1}{2} P_n \right|^{\nu_n}}} \, \frac{\Gamma_v \left( \frac{\nu_n}{2} \right)}{\Gamma_v \left( \frac{\nu_0}{2} \right)} = p(y|m) \; .
+$$
+
+Thus, the [log model evidence](/D/lme) of this model is given by
+
+\begin{equation} \label{eq:GLM-NW-LME-s6}
+\begin{split}
+\log p(y|m) = & \frac{v}{2} \log |P| - \frac{nv}{2} \log (2 \pi)  + \frac{v}{2} \log |\Lambda_0| - \frac{v}{2} \log |\Lambda_n| + \\
+& \frac{\nu_0}{2} \log\left| \frac{1}{2} P_0 \right| - \frac{\nu_n}{2} \log\left| \frac{1}{2} P_n \right| + \log \Gamma_v \left( \frac{\nu_n}{2} \right) - \log \Gamma_v \left( \frac{\nu_0}{2} \right) \; .
+\end{split}
+\end{equation}
diff --git a/P/mblr-post.md b/P/mblr-post.md
@@ -0,0 +1,162 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-09-03 08:37:00
+
+title: "Posterior distribution for multivariate Bayesian linear regression"
+chapter: "Statistical Models"
+section: "Multivariate normal data"
+topic: "Multivariate Bayesian linear regression"
+theorem: "Posterior distribution"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2020
+    title: "Bayesian multivariate linear regression"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2020-09-03"
+    url: "https://en.wikipedia.org/wiki/Bayesian_multivariate_linear_regression#Posterior_distribution"
+
+proof_id: "P160"
+shortcut: "mblr-post"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let
+
+$$ \label{eq:GLM}
+Y = X B + E, \; E \sim \mathcal{MN}(0, V, \Sigma)
+$$
+
+be a [general linear model](/D/glm) with measured $n \times v$ data matrix $Y$, known $n \times p$ design matrix $X$, known $n \times n$ [covariance structure](/D/matn) $V$ as well as unknown $p \times v$ regression coefficients $B$ and unknown $v \times v$ [noise covariance](/D/matn) $\Sigma$. Moreover, assume a [normal-Wishart prior distribution](/P/mblr-prior) over the model parameters $B$ and $T = \Sigma^{-1}$:
+
+$$ \label{eq:GLM-NW-prior}
+p(B,T) = \mathcal{MN}(B; M_0, \Lambda_0^{-1}, T^{-1}) \cdot \mathcal{W}(T; P_0^{-1}, \nu_0) \; .
+$$
+
+Then, the [posterior distribution](/D/post) is also a [normal-Wishart distribution](/D/nw)
+
+$$ \label{eq:GLM-NW-post}
+p(B,T|Y) = \mathcal{MN}(B; M_n, \Lambda_n^{-1}, T^{-1}) \cdot \mathcal{W}(T; P_n^{-1}, \nu_n)
+$$
+
+and the [posterior hyperparameters](/D/post) are given by
+
+$$ \label{eq:GLM-NW-post-par}
+\begin{split}
+M_n &= \Lambda_n^{-1} (X^\mathrm{T} P Y + \Lambda_0 M_0) \\
+\Lambda_n &= X^\mathrm{T} P X + \Lambda_0 \\
+P_n &= P_0 + Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n \\
+\nu_n &= \nu_0 + n \; .
+\end{split}
+$$
+
+
+**Proof:** According to [Bayes' theorem](/P/bayes-th), the [posterior distribution](/D/post) is given by
+
+$$ \label{eq:GLM-NG-BT}
+p(B,T|Y) = \frac{p(Y|B,T) \, p(B,T)}{p(Y)} \; .
+$$
+
+Since $p(Y)$ is just a normalization factor, the [posterior is proportional](/P/post-jl) to the numerator:
+
+$$ \label{eq:GLM-NG-post-JL}
+p(B,T|Y) \propto p(Y|B,T) \, p(B,T) = p(Y,B,T) \; .
+$$
+
+Equation \eqref{eq:GLM} implies the following [likelihood function](/D/lf)
+
+$$ \label{eq:GLM-LF-Class}
+p(Y|B,\Sigma) = \mathcal{MN}(Y; X B, V, \Sigma) = \sqrt{\frac{1}{(2 \pi)^{nv} |\Sigma|^n |V|^v}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( \Sigma^{-1} (Y-XB)^\mathrm{T} V^{-1} (Y-XB) \right) \right]
+$$
+
+which, for mathematical convenience, can also be parametrized as
+
+$$ \label{eq:GLM-LF-Bayes}
+p(Y|B,T) = \mathcal{MN}(Y; X B, P, T^{-1}) = \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T (Y-XB)^\mathrm{T} P (Y-XB) \right) \right]
+$$
+
+using the $v \times v$ [precision matrix](/D/precmat) $T = \Sigma^{-1}$ and the $n \times n$ [precision matrix](/D/precmat) $P = V^{-1}$.
+
+<br>
+Combining the [likelihood function](/D/lf) \eqref{eq:GLM-LF-Bayes} with the [prior distribution](/D/prior) \eqref{eq:GLM-NW-prior}, the [joint likelihood](/D/jl) of the model is given by
+
+$$ \label{eq:GLM-NW-JL-s1}
+\begin{split}
+p(Y,B,T) = \; & p(Y|B,T) \, p(B,T) \\
+= \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T (Y-XB)^\mathrm{T} P (Y-XB) \right) \right] \cdot \\
+& \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \, \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T (B-M_0)^\mathrm{T} \Lambda_0 (B-M_0) \right) \right] \cdot \\
+& \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \; .
+\end{split}
+$$
+
+Collecting identical variables gives:
+
+$$ \label{eq:GLM-NW-JL-s2}
+\begin{split}
+p(Y,B,T) = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \cdot \\
+& \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ (Y-XB)^\mathrm{T} P (Y-XB) + (B-M_0)^\mathrm{T} \Lambda_0 (B-M_0) \right] \right) \right] \; .
+\end{split}
+$$
+
+Expanding the products in the exponent gives:
+
+$$ \label{eq:GLM-NW-JL-s3}
+\begin{split}
+p(Y,B,T) = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \cdot \\
+& \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ Y^\mathrm{T} P Y - Y^\mathrm{T} P X B - B^\mathrm{T} X^\mathrm{T} P Y + B^\mathrm{T} X^\mathrm{T} P X B + \right. \right. \right. \\
+& \hphantom{\exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ \right. \right. \right. \!\!\!} \; \left. \left. \left. B^\mathrm{T} \Lambda_0 B - B^\mathrm{T} \Lambda_0 M_0 - M_0^\mathrm{T} \Lambda_0 B + M_0^\mathrm{T} \Lambda_0 \mu_0 \right] \right) \right] \; .
+\end{split}
+$$
+
+Completing the square over $B$, we finally have
+
+$$ \label{eq:GLM-NW-JL-s4}
+\begin{split}
+p(Y,B,T) = \; & \sqrt{\frac{|T|^n |P|^v}{(2 \pi)^{nv}}} \sqrt{\frac{|T|^p |\Lambda_0|^v}{(2 \pi)^{pv}}} \sqrt{\frac{|P_0|^{\nu_0}}{2^{\nu_0 v}}} \frac{1}{\Gamma_v \left( \frac{\nu_0}{2} \right)} \cdot |T|^{(\nu_0-v-1)/2} \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_0 T \right) \right] \cdot \\
+& \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ (B-M_n)^\mathrm{T} \Lambda_n (B-M_n) + (Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n) \right] \right) \right] \; .
+\end{split}
+$$
+
+with the [posterior hyperparameters](/D/post)
+
+$$ \label{eq:GLM-NW-post-B-par}
+\begin{split}
+M_n &= \Lambda_n^{-1} (X^\mathrm{T} P Y + \Lambda_0 M_0) \\
+\Lambda_n &= X^\mathrm{T} P X + \Lambda_0 \; .
+\end{split}
+$$
+
+Ergo, the joint likelihood is proportional to
+
+$$ \label{eq:GLM-NW-JL-s5}
+p(Y,B,T) \propto |T|^{p/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( T \left[ (B-M_n)^\mathrm{T} \Lambda_n (B-M_n) \right] \right) \right] \cdot |T|^{(\nu_n-v-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( P_n T \right) \right]
+$$
+
+with the [posterior hyperparameters](/D/post)
+
+$$ \label{eq:GLM-NW-post-T-par}
+\begin{split}
+P_n &= P_0 + Y^\mathrm{T} P Y + M_0^\mathrm{T} \Lambda_0 M_0 - M_n^\mathrm{T} \Lambda_n M_n \\
+\nu_n &= \nu_0 + n \; .
+\end{split}
+$$
+
+From the term in \eqref{eq:GLM-NW-JL-s5}, we can isolate the posterior distribution over $B$ given $T$:
+
+$$ \label{eq:GLM-NW-post-B}
+p(B|T,Y) = \mathcal{MN}(B; M_n, \Lambda_n^{-1}, T^{-1}) \; .
+$$
+
+From the remaining term, we can isolate the posterior distribution over $T$:
+
+$$ \label{eq:GLM-NW-post-T}
+p(T|Y) = \mathcal{W}(T; P_n^{-1}, \nu_n) \; .
+$$
+
+Together, \eqref{eq:GLM-NW-post-B} and \eqref{eq:GLM-NW-post-T} constitute the [joint](/D/prob-joint) [posterior distribution](/D/post) of $B$ and $T$.
diff --git a/P/mblr-prior.md b/P/mblr-prior.md