|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2024-11-01 11:51:06 |
| 9 | + |
| 10 | +title: "Mutual information of the bivariate normal distribution" |
| 11 | +chapter: "Probability Distributions" |
| 12 | +section: "Multivariate continuous distributions" |
| 13 | +topic: "Bivariate normal distribution" |
| 14 | +theorem: "Mutual information" |
| 15 | + |
| 16 | +sources: |
| 17 | + - authors: "Krafft, Peter" |
| 18 | + year: 2013 |
| 19 | + title: "Correlation and Mutual Information" |
| 20 | + in: "Princeton University Department of Computer Science: Laboratory for Intelligent Probabilistic Systems" |
| 21 | + pages: "February 13, 2013" |
| 22 | + url: "https://lips.cs.princeton.edu/correlation-and-mutual-information/" |
| 23 | + |
| 24 | +proof_id: "P476" |
| 25 | +shortcut: "bvn-mi" |
| 26 | +username: "JoramSoch" |
| 27 | +--- |
| 28 | + |
| 29 | + |
| 30 | +**Theorem:** Let $X$ and $Y$ follow a [bivariate normal distribution](/D/bvn): |
| 31 | + |
| 32 | +$$ \label{eq:bvn} |
| 33 | +\left[ \begin{matrix} X \\ Y \end{matrix} \right] \sim |
| 34 | +\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{matrix} \right] \right) \; . |
| 35 | +$$ |
| 36 | + |
| 37 | +Then, the [mutual information](/D/mi) of $X$ and $Y$ is |
| 38 | + |
| 39 | +$$ \label{eq:bvn-lincomb} |
| 40 | +\mathrm{I}(X,Y) = -\frac{1}{2} \ln (1-\rho^2) |
| 41 | +$$ |
| 42 | + |
| 43 | +where $\rho$ is the [correlation](/D/corr) of $X$ and $Y$. |
| 44 | + |
| 45 | + |
| 46 | +**Proof:** [Mutual information can be written in terms of marginal and joint differential entropy](/P/cmi-mjde): |
| 47 | + |
| 48 | +$$ \label{eq:cmi-mjde} |
| 49 | +\mathrm{I}(X,Y) = \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \; . |
| 50 | +$$ |
| 51 | + |
| 52 | +The [marginal distributions of the multivariate normal distribution are also multivariate normal] |
| 53 | + |
| 54 | +$$ \label{eq:mvn-marg} |
| 55 | +\left[ \begin{matrix} X_1 \\ X_2 \end{matrix} \right] \sim |
| 56 | +\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{matrix} \right] \right) |
| 57 | +\quad \Rightarrow \quad |
| 58 | +X_1 \sim \mathcal{N}\left( \mu_1, \Sigma_{11} \right) \; , |
| 59 | +$$ |
| 60 | + |
| 61 | +such that the [marginals](/D/marg) of the [bivariate normal distribution](/D/bvn) are [univariate normal distribution](/D/norm): |
| 62 | + |
| 63 | +$$ \label{eq:bvn-marg} |
| 64 | +\left[ \begin{matrix} X \\ Y \end{matrix} \right] \sim |
| 65 | +\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{matrix} \right] \right) |
| 66 | +\quad \Rightarrow \quad |
| 67 | +X \sim \mathcal{N}\left( \mu_1, \sigma_1^2 \right) |
| 68 | +\quad \text{and} \quad |
| 69 | +Y \sim \mathcal{N}\left( \mu_2, \sigma_2^2 \right) \; . |
| 70 | +$$ |
| 71 | + |
| 72 | +The [differential entropy of the univariate normal distribution](/P/norm-dent) is |
| 73 | + |
| 74 | +$$ \label{eq:norm-dent} |
| 75 | +\mathrm{h}(X) = \frac{1}{2} \ln\left( 2 \pi \sigma^2 e \right) |
| 76 | +$$ |
| 77 | + |
| 78 | +and the [differential entropy of the multivariate normal distribution](/P/mvn-dent) is |
| 79 | + |
| 80 | +$$ \label{eq:mvn-dent} |
| 81 | +\mathrm{h}(x) = \frac{n}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma| + \frac{1}{2} n |
| 82 | +$$ |
| 83 | + |
| 84 | +where $\lvert \Sigma \rvert$ is the determinant of the [covariance matrix](/D/covmat) $\Sigma$. A two-dimensional [covariance matrix can be rewritten in terms of correlations](/P/covmat-corrmat) as follows: |
| 85 | + |
| 86 | +$$ \label{eq:Sigma} |
| 87 | +\begin{split} |
| 88 | +\Sigma |
| 89 | +&= \left[ \begin{matrix} \sigma_1 & 0 \\ 0 & \sigma_2 \end{matrix} \right] \left[ \begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix} \right] \left[ \begin{matrix} \sigma_1 & 0 \\ 0 & \sigma_2 \end{matrix} \right] \\ |
| 90 | +&= \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \; . |
| 91 | +\end{split} |
| 92 | +$$ |
| 93 | + |
| 94 | +Combining \eqref{eq:cmi-mjde} with \eqref{eq:norm-dent} and \eqref{eq:mvn-dent}, applying $n = 2$, we get: |
| 95 | + |
| 96 | +$$ \label{eq:bvn-mi} |
| 97 | +\begin{split} |
| 98 | +\mathrm{I}(X,Y) |
| 99 | +&\overset{\eqref{eq:cmi-mjde}}{=} \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \\ |
| 100 | +&\overset{\eqref{eq:bvn-marg}}{=} \mathrm{h}\left[ \mathcal{N}\left( \mu_1, \sigma_1^2 \right) \right] + \mathrm{h}\left[ \mathcal{N}\left( \mu_2, \sigma_2^2 \right) \right] - \mathrm{h}\left[ \mathcal{N}\left( \mu, \Sigma \right) \right] \\ |
| 101 | +&\overset{\eqref{eq:Sigma}}{=} \left[ \frac{1}{2} \ln\left( 2 \pi \sigma_1^2 e \right) \right] + \left[ \frac{1}{2} \ln\left( 2 \pi \sigma_2^2 e \right) \right] - \left[ \frac{2}{2} \ln(2\pi) + \frac{1}{2} \ln \left| \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \right| + \frac{1}{2} \cdot 2 \right] \\ |
| 102 | +&= \left( \frac{2}{2} \ln(2\pi) + \frac{2}{2} \ln(e) - \ln(2\pi) - 1 \right) + \left( \frac{1}{2} \ln\left( \sigma_1^2 \right) + \frac{1}{2} \ln\left( \sigma_2^2 \right) - \frac{1}{2} \ln \left| \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \right| \right) \\ |
| 103 | +&= \frac{1}{2} \left[ \ln\left( \sigma_1^2 \right) + \ln\left( \sigma_2^2 \right) - \ln\left( \sigma_1^2 \sigma_2^2 - (\rho \, \sigma_1 \sigma_2)^2 \right) \right] \\ |
| 104 | +&= \frac{1}{2} \ln \left[ \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 \sigma_2^2 - (\rho \, \sigma_1 \sigma_2)^2} \right] \\ |
| 105 | +&= \frac{1}{2} \ln \left[ \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 \sigma_2^2 (1-\rho^2)} \right] \\ |
| 106 | +&= \frac{1}{2} \ln \left[ \frac{1}{1-\rho^2} \right] \\ |
| 107 | +&= -\frac{1}{2} \ln (1-\rho^2) \; . |
| 108 | +\end{split} |
| 109 | +$$ |
0 commit comments