|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2020-11-19 07:08:00 |
| 9 | + |
| 10 | +title: "Kullback-Leibler divergence for the normal distribution" |
| 11 | +chapter: "Probability Distributions" |
| 12 | +section: "Univariate continuous distributions" |
| 13 | +topic: "Normal distribution" |
| 14 | +theorem: "Kullback-Leibler divergence" |
| 15 | + |
| 16 | +sources: |
| 17 | + |
| 18 | +proof_id: "P193" |
| 19 | +shortcut: "norm-kl" |
| 20 | +username: "JoramSoch" |
| 21 | +--- |
| 22 | + |
| 23 | + |
| 24 | +**Theorem:** Let $X$ be a [random variable](/D/rvar). Assume two [normal distributions](/D/norm) $P$ and $Q$ specifying the probability distribution of $X$ as |
| 25 | + |
| 26 | +$$ \label{eq:norms} |
| 27 | +\begin{split} |
| 28 | +P: \; X &\sim \mathrm{Gam}(\mu_1, \sigma_1^2) \\ |
| 29 | +Q: \; X &\sim \mathrm{Gam}(\mu_2, \sigma_2^2) \; . \\ |
| 30 | +\end{split} |
| 31 | +$$ |
| 32 | + |
| 33 | +Then, the [Kullback-Leibler divergence](/D/kl) of $P$ from $Q$ is given by |
| 34 | + |
| 35 | +$$ \label{eq:norm-KL} |
| 36 | +\mathrm{KL}[P\,||\,Q] = \frac{1}{2} \left[ \frac{(\mu_2 - \mu_1)^2}{\sigma_2^2} + \frac{\sigma_1^2}{\sigma_2^2} - \ln \frac{\sigma_1^2}{\sigma_2^2} - 1 \right] \; . |
| 37 | +$$ |
| 38 | + |
| 39 | + |
| 40 | +**Proof:** The [KL divergence for a continuous random variable](/D/kl) is given by |
| 41 | + |
| 42 | +$$ \label{eq:KL-cont} |
| 43 | +\mathrm{KL}[P\,||\,Q] = \int_{\mathcal{X}} p(x) \, \ln \frac{p(x)}{q(x)} \, \mathrm{d}x |
| 44 | +$$ |
| 45 | + |
| 46 | +which, applied to the [normal distributions](/D/norm) in \eqref{eq:norms}, yields |
| 47 | + |
| 48 | +$$ \label{eq:norm-KL-s1} |
| 49 | +\begin{split} |
| 50 | +\mathrm{KL}[P\,||\,Q] &= \int_{-\infty}^{+\infty} \mathcal{N}(x; \mu_1, \sigma_1^2) \, \ln \frac{\mathcal{N}(x; \mu_1, \sigma_1^2)}{\mathcal{N}(x; \mu_2, \sigma_2^2)} \, \mathrm{d}x \\ |
| 51 | +&= \left\langle \ln \frac{\mathcal{N}(x; \mu_1, \sigma_1^2)}{\mathcal{N}(x; \mu_2, \sigma_2^2)} \right\rangle_{p(x)} \; . |
| 52 | +\end{split} |
| 53 | +$$ |
| 54 | + |
| 55 | +Using the [probability density function of the normal distribution](/P/norm-pdf), this becomes: |
| 56 | + |
| 57 | +$$ \label{eq:norm-KL-s2} |
| 58 | +\begin{split} |
| 59 | +\mathrm{KL}[P\,||\,Q] &= \left\langle \ln \frac{ \frac{1}{\sqrt{2 \pi} \sigma_1} \cdot \exp \left[ -\frac{1}{2} \left( \frac{x-\mu_1}{\sigma_1} \right)^2 \right] }{ \frac{1}{\sqrt{2 \pi} \sigma_2} \cdot \exp \left[ -\frac{1}{2} \left( \frac{x-\mu_2}{\sigma_2} \right)^2 \right] } \right\rangle_{p(x)} \\ |
| 60 | +&= \left\langle \ln \left( \sqrt \frac{\sigma_2^2}{\sigma_1^2} \cdot \exp\left[ -\frac{1}{2} \left( \frac{x-\mu_1}{\sigma_1} \right)^2 + \frac{1}{2} \left( \frac{x-\mu_2}{\sigma_2} \right)^2 \right] \right) \right\rangle_{p(x)} \\ |
| 61 | +&= \left\langle \frac{1}{2} \ln \frac{\sigma_2^2}{\sigma_1^2} -\frac{1}{2} \left( \frac{x-\mu_1}{\sigma_1} \right)^2 + \frac{1}{2} \left( \frac{x-\mu_2}{\sigma_2} \right)^2 \right\rangle_{p(x)} \\ |
| 62 | +&= \frac{1}{2} \left\langle - \left( \frac{x-\mu_1}{\sigma_1} \right)^2 + \left( \frac{x-\mu_2}{\sigma_2} \right)^2 - \ln \frac{\sigma_1^2}{\sigma_2^2} \right\rangle_{p(x)} \\ |
| 63 | +&= \frac{1}{2} \left\langle - \frac{(x-\mu_1)^2}{\sigma_1^2} + \frac{x^2 - 2 \mu_2 x + \mu_2^2}{\sigma_2^2} - \ln \frac{\sigma_1^2}{\sigma_2^2} \right\rangle_{p(x)} \; . |
| 64 | +\end{split} |
| 65 | +$$ |
| 66 | + |
| 67 | +Because trace function and [expected value](/D/mean) are both linear operators, the expectation can be moved inside the trace: |
| 68 | + |
| 69 | +$$ \label{eq:norm-KL-s3} |
| 70 | +\begin{split} |
| 71 | +\mathrm{KL}[P\,||\,Q] &= \frac{1}{2} \left[ - \frac{\left\langle (x-\mu_1)^2 \right\rangle}{\sigma_1^2} + \frac{\left\langle x^2 - 2 \mu_2 x + \mu_2^2 \right\rangle}{\sigma_2^2} - \left\langle \ln \frac{\sigma_1^2}{\sigma_2^2} \right\rangle \right] \\ |
| 72 | +&= \frac{1}{2} \left[ - \frac{\left\langle (x-\mu_1)^2 \right\rangle}{\sigma_1^2} + \frac{\left\langle x^2 \right\rangle - \left\langle 2 \mu_2 x \right\rangle + \left\langle \mu_2^2 \right\rangle}{\sigma_2^2} - \ln \frac{\sigma_1^2}{\sigma_2^2} \right] \; . |
| 73 | +\end{split} |
| 74 | +$$ |
| 75 | + |
| 76 | +The first expectation corresponds to the [variance](/D/var) |
| 77 | + |
| 78 | +$$ \label{eq:var} |
| 79 | +\left\langle (X-\mu)^2 \right\rangle = \mathrm{E}[(X-\mathrm{E}(X))^2] = \mathrm{Var}(X) |
| 80 | +$$ |
| 81 | + |
| 82 | +and the [variance of a normally distributed random variable](/P/norm-var) is |
| 83 | + |
| 84 | +$$ \label{eq:norm-var} |
| 85 | +X \sim \mathcal{N}(\mu, \sigma^2) \quad \Rightarrow \quad \mathrm{Var}(X) = \sigma^2 \; . |
| 86 | +$$ |
| 87 | + |
| 88 | +Additionally applying the [raw moments of the normal distribution](/P/norm-mgf) |
| 89 | + |
| 90 | +$$ \label{eq:norm-mom-raw} |
| 91 | +X \sim \mathcal{N}(\mu, \sigma^2) \quad \Rightarrow \quad \left\langle x \right\rangle = \mu \quad \text{and} \quad \left\langle x^2 \right\rangle = \mu^2 + \sigma^2 \; , |
| 92 | +$$ |
| 93 | + |
| 94 | +the Kullback-Leibler divergence in \eqref{eq:norm-KL-s3} becomes |
| 95 | + |
| 96 | +$$ \label{eq:norm-KL-s4} |
| 97 | +\begin{split} |
| 98 | +\mathrm{KL}[P\,||\,Q] &= \frac{1}{2} \left[ - \frac{\sigma_1^2}{\sigma_1^2} + \frac{\mu_1^2 + \sigma_1^2 - 2 \mu_2 \mu_1 + \mu_2^2}{\sigma_2^2} - \ln \frac{\sigma_1^2}{\sigma_2^2} \right] \\ |
| 99 | +&= \frac{1}{2} \left[ \frac{\mu_1^2 - 2 \mu_1 \mu_2 + \mu_2^2}{\sigma_2^2} + \frac{\sigma_1^2}{\sigma_2^2} - \ln \frac{\sigma_1^2}{\sigma_2^2} - 1 \right] \\ |
| 100 | +&= \frac{1}{2} \left[ \frac{(\mu_1 - \mu_2)^2}{\sigma_2^2} + \frac{\sigma_1^2}{\sigma_2^2} - \ln \frac{\sigma_1^2}{\sigma_2^2} - 1 \right] |
| 101 | +\end{split} |
| 102 | +$$ |
| 103 | + |
| 104 | +which is equivalent to \eqref{eq:norm-KL}. |
0 commit comments