|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2021-12-02 15:33:00 |
| 9 | + |
| 10 | +title: "Kullback-Leibler divergence for the Wishart distribution" |
| 11 | +chapter: "Probability Distributions" |
| 12 | +section: "Matrix-variate continuous distributions" |
| 13 | +topic: "Wishart distribution" |
| 14 | +theorem: "Kullback-Leibler divergence" |
| 15 | + |
| 16 | +sources: |
| 17 | + - authors: "Penny, William D." |
| 18 | + year: 2001 |
| 19 | + title: "KL-Divergences of Normal, Gamma, Dirichlet and Wishart densities" |
| 20 | + in: "University College, London" |
| 21 | + pages: "pp. 2-3, eqs. 13/15" |
| 22 | + url: "https://www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps" |
| 23 | + - authors: "Wikipedia" |
| 24 | + year: 2021 |
| 25 | + title: "Wishart distribution" |
| 26 | + in: "Wikipedia, the free encyclopedia" |
| 27 | + pages: "retrieved on 2021-12-02" |
| 28 | + url: "https://en.wikipedia.org/wiki/Wishart_distribution#KL-divergence" |
| 29 | + |
| 30 | +proof_id: "P295" |
| 31 | +shortcut: "wish-kl" |
| 32 | +username: "JoramSoch" |
| 33 | +--- |
| 34 | + |
| 35 | + |
| 36 | +**Theorem:** Let $S$ be a $p \times p$ [random matrix](/D/rmat). Assume two [Wishart distributions](/D/wish) $P$ and $Q$ specifying the probability distribution of $S$ as |
| 37 | + |
| 38 | +$$ \label{eq:wishs} |
| 39 | +\begin{split} |
| 40 | +P: \; S &\sim \mathcal{W}(V_1, n_1) \\ |
| 41 | +Q: \; S &\sim \mathcal{W}(V_2, n_2) \; . |
| 42 | +\end{split} |
| 43 | +$$ |
| 44 | + |
| 45 | +Then, the [Kullback-Leibler divergence](/D/kl) of $P$ from $Q$ is given by |
| 46 | + |
| 47 | +$$ \label{eq:wish-KL} |
| 48 | +\mathrm{KL}[P\,||\,Q] = \frac{1}{2} \left[ n_2 \left( \ln |V_2| - \ln |V_1| \right) + n_1 \mathrm{tr}(V_2^{-1} V_1) + 2 \ln \frac{\Gamma_p\left(\frac{n_2}{2}\right)}{\Gamma_p\left(\frac{n_1}{2}\right)} + (n_1-n_2) \psi_p\left(\frac{n_1}{2}\right) - n_1 p \right] |
| 49 | +$$ |
| 50 | + |
| 51 | +where $\Gamma_p(x)$ is the multivariate gamma function |
| 52 | + |
| 53 | +$$ \label{eq:mult-gam-fct} |
| 54 | +\Gamma_p(x) = \pi^{p(p-1)/4} \, \prod_{j=1}^k \Gamma\left(x - \frac{j-1}{2}\right) |
| 55 | +$$ |
| 56 | + |
| 57 | +and $\psi_p(x)$ is the multivariate digamma function |
| 58 | + |
| 59 | +$$ \label{eq:mult-gam-fct} |
| 60 | +\psi_p(x) = \frac{\mathrm{d}\ln \Gamma_p(x)}{\mathrm{d}x} = \sum_{j=1}^k \psi\left(x - \frac{j-1}{2}\right) \; . |
| 61 | +$$ |
| 62 | + |
| 63 | + |
| 64 | +**Proof:** The [KL divergence for a continuous random variable](/D/kl) is given by |
| 65 | + |
| 66 | +$$ \label{eq:KL-cont} |
| 67 | +\mathrm{KL}[P\,||\,Q] = \int_{\mathcal{X}} p(x) \, \ln \frac{p(x)}{q(x)} \, \mathrm{d}x |
| 68 | +$$ |
| 69 | + |
| 70 | +which, applied to the [Wishart distributions](/D/wish) in \eqref{eq:wishs}, yields |
| 71 | + |
| 72 | +$$ \label{eq:wish-KL-s1} |
| 73 | +\begin{split} |
| 74 | +\mathrm{KL}[P\,||\,Q] &= \int_{\mathcal{S}^p} \mathcal{W}(S; V_1, n_1) \, \ln \frac{\mathcal{W}(S; V_1, n_1)}{\mathcal{W}(S; V_2, n_2)} \, \mathrm{d}S \\ |
| 75 | +&= \left\langle \ln \frac{\mathcal{W}(S; \alpha_1)}{\mathcal{W}(S; \alpha_1)} \right\rangle_{p(S)} |
| 76 | +\end{split} |
| 77 | +$$ |
| 78 | + |
| 79 | +where $\mathcal{S}^p$ is the set of all positive-definite symmetric $p \times p$ matrices. |
| 80 | + |
| 81 | +Using the [probability density function of the Wishart distribution](/P/wish-pdf), this becomes: |
| 82 | + |
| 83 | +$$ \label{eq:wish-KL-s2} |
| 84 | +\begin{split} |
| 85 | +\mathrm{KL}[P\,||\,Q] &= \left\langle \ln \frac{\frac{1}{\sqrt{2^{n_1 p} |V_1|^{n_1}} \Gamma_p \left( \frac{n_1}{2} \right)} \cdot |S|^{(n_1-p-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V_1^{-1} S \right) \right]}{\frac{1}{\sqrt{2^{n_2 p} |V_2|^{n_2}} \Gamma_p \left( \frac{n_2}{2} \right)} \cdot |S|^{(n_2-p-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V_2^{-1} S \right) \right]} \right\rangle_{p(S)} \\ |
| 86 | +&= \left\langle \ln \left( \sqrt{2^{(n_2-n_1)p} \cdot \frac{|V_2|^{n_2}}{|V_1|^{n_1}}} \cdot \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} \cdot |S|^{(n_1-n_2)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V_1^{-1} S \right) -\frac{1}{2} \mathrm{tr}\left( V_2^{-1} S \right) \right] \right) \right\rangle_{p(S)} \\ |
| 87 | +&= \left\langle \frac{(n_2-n_1)p}{2} \ln 2 + \frac{n_2}{2} \ln |V_2| - \frac{n_1}{2} \ln |V_1| + \ln \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} \right. \\ |
| 88 | +&+ \left. \quad \frac{n_1-n_2}{2} \ln |S| - \frac{1}{2} \mathrm{tr}\left( V_1^{-1} S \right) - \frac{1}{2} \mathrm{tr}\left( V_2^{-1} S \right) \right\rangle_{p(S)} \\ |
| 89 | +&= \frac{(n_2-n_1)p}{2} \ln 2 + \frac{n_2}{2} \ln |V_2| - \frac{n_1}{2} \ln |V_1| + \ln \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} \\ |
| 90 | +&+ \frac{n_1-n_2}{2} \left\langle \ln |S| \right\rangle_{p(S)} - \frac{1}{2} \left\langle \mathrm{tr}\left( V_1^{-1} S \right) \right\rangle_{p(S)} - \frac{1}{2} \left\langle \mathrm{tr}\left( V_2^{-1} S \right) \right\rangle_{p(S)} \; . |
| 91 | +\end{split} |
| 92 | +$$ |
| 93 | + |
| 94 | +Using the [expected value of a Wishart random matrix](/P/wish-mean) |
| 95 | + |
| 96 | +$$ \label{eq:wish-mean} |
| 97 | +S \sim \mathcal{W}(V,n) \quad \Rightarrow \quad \left\langle S \right\rangle = n V \; , |
| 98 | +$$ |
| 99 | + |
| 100 | +such that the [expected value of the matrix trace](/P/mean-tr) becomes |
| 101 | + |
| 102 | +$$ \label{eq:wish-trmean} |
| 103 | +\left\langle \mathrm{tr}(AS) \right\rangle = \mathrm{tr}\left( \left\langle AS \right\rangle \right) = \mathrm{tr}\left( A \left\langle S \right\rangle \right) = \mathrm{tr}\left( A \cdot (nV) \right) = n \cdot \mathrm{tr}(AV) \; , |
| 104 | +$$ |
| 105 | + |
| 106 | +and the [expected value of a Wishart log-determinant](/P/wish-logdetmean) |
| 107 | + |
| 108 | +$$ \label{eq:wish-logdetmean} |
| 109 | +S \sim \mathcal{W}(V,n) \quad \Rightarrow \quad \left\langle \ln |S| \right\rangle = \psi_p\left(\frac{n}{2}\right) + p \cdot \ln 2 + \ln |V| \; , |
| 110 | +$$ |
| 111 | + |
| 112 | +the Kullback-Leibler divergence from \eqref{eq:wish-KL-s2} becomes: |
| 113 | + |
| 114 | +$$ \label{eq:wish-KL-s3} |
| 115 | +\begin{split} |
| 116 | +\mathrm{KL}[P\,||\,Q] &= \frac{(n_2-n_1)p}{2} \ln 2 + \frac{n_2}{2} \ln |V_2| - \frac{n_1}{2} \ln |V_1| + \ln \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} \\ |
| 117 | +&+ \frac{n_1-n_2}{2} \left[ \psi_p\left(\frac{n_1}{2}\right) + p \cdot \ln 2 + \ln |V_1| \right] - \frac{n_1}{2} \mathrm{tr}\left( V_1^{-1} V_1 \right) - \frac{n_1}{2} \mathrm{tr}\left( V_2^{-1} V_1 \right) \\ |
| 118 | +&= \frac{n_2}{2} \left( \ln |V_2| - \ln |V_1| \right) + \ln \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} + \frac{n_1-n_2}{2} \psi_p\left(\frac{n_1}{2}\right) - \frac{n_1}{2} \mathrm{tr}\left( I_p \right) - \frac{n_1}{2} \mathrm{tr}\left( V_2^{-1} V_1 \right) \\ |
| 119 | +& = \frac{1}{2} \left[ n_2 \left( \ln |V_2| - \ln |V_1| \right) + n_1 \mathrm{tr}(V_2^{-1} V_1) + 2 \ln \frac{\Gamma_p\left(\frac{n_2}{2}\right)}{\Gamma_p\left(\frac{n_1}{2}\right)} + (n_1-n_2) \psi_p\left(\frac{n_1}{2}\right) - n_1 p \right] \; . |
| 120 | +\end{split} |
| 121 | +$$ |
0 commit comments