Merge pull request #140 from JoramSoch/master

JoramSoch · web-flow · commit 629a6dd0926c · 2021-12-02T21:57:48.000+01:00
added 3 proofs
diff --git a/I/ToC.md b/I/ToC.md
@@ -439,19 +439,22 @@ title: "Table of Contents"
    4.4. Dirichlet distribution <br>
    &emsp;&ensp; 4.4.1. *[Definition](/D/dir)* <br>
    &emsp;&ensp; 4.4.2. **[Probability density function](/P/dir-pdf)** <br>
-   &emsp;&ensp; 4.4.3. **[Exceedance probabilities](/P/dir-ep)** <br>
+   &emsp;&ensp; 4.4.3. **[Kullback-Leibler divergence](/P/dir-kl)** <br>
+   &emsp;&ensp; 4.4.4. **[Exceedance probabilities](/P/dir-ep)** <br>
 
 5. Matrix-variate continuous distributions
 
    5.1. Matrix-normal distribution <br>
    &emsp;&ensp; 5.1.1. *[Definition](/D/matn)* <br>
    &emsp;&ensp; 5.1.2. **[Probability density function](/P/matn-pdf)** <br>
    &emsp;&ensp; 5.1.3. **[Equivalence to multivariate normal distribution](/P/matn-mvn)** <br>
-   &emsp;&ensp; 5.1.4. **[Transposition](/P/matn-trans)** <br>
-   &emsp;&ensp; 5.1.5. **[Linear transformation](/P/matn-ltt)** <br>
+   &emsp;&ensp; 5.1.4. **[Kullback-Leibler divergence](/P/matn-kl)** <br>
+   &emsp;&ensp; 5.1.5. **[Transposition](/P/matn-trans)** <br>
+   &emsp;&ensp; 5.1.6. **[Linear transformation](/P/matn-ltt)** <br>
    
    5.2. Wishart distribution <br>
    &emsp;&ensp; 5.2.1. *[Definition](/D/wish)* <br>
+   &emsp;&ensp; 5.2.2. **[Kullback-Leibler divergence](/P/wish-kl)** <br>
 
 
 <br>
diff --git a/P/dir-kl.md b/P/dir-kl.md
@@ -0,0 +1,84 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-12-02 14:28:00
+
+title: "Kullback-Leibler divergence for the Dirichlet distribution"
+chapter: "Probability Distributions"
+section: "Multivariate continuous distributions"
+topic: "Dirichlet distribution"
+theorem: "Kullback-Leibler divergence"
+
+sources:
+  - authors: "Penny, William D."
+    year: 2001
+    title: "KL-Divergences of Normal, Gamma, Dirichlet and Wishart densities"
+    in: "University College, London"
+    pages: "p. 2, eqs. 8-9"
+    url: "https://www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps"
+
+proof_id: "P294"
+shortcut: "dir-kl"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $x$ be an $k \times 1$ [random vector](/D/rvec). Assume two [Dirichlet distributions](/D/dir) $P$ and $Q$ specifying the probability distribution of $x$ as
+
+$$ \label{eq:dirs}
+\begin{split}
+P: \; x &\sim \mathrm{Dir}(\alpha_1) \\
+Q: \; x &\sim \mathrm{Dir}(\alpha_2) \; .
+\end{split}
+$$
+
+Then, the [Kullback-Leibler divergence](/D/kl) of $P$ from $Q$ is given by
+
+$$ \label{eq:dir-KL}
+\mathrm{KL}[P\,||\,Q] = \ln \frac{\Gamma\left(\sum_{i=1}^{k} \alpha_{1i}\right)}{\Gamma\left(\sum_{i=1}^{k} \alpha_{2i}\right)} + \sum_{i=1}^{k} \ln \frac{\Gamma(\alpha_{2i})}{\Gamma(\alpha_{1i})} + \sum_{i=1}^{k} \left( \alpha_{1i} - \alpha_{2i} \right) \left[ \psi(\alpha_{1i}) - \psi\left(\sum_{i=1}^{k} \alpha_{1i}\right) \right] \; .
+$$
+
+
+**Proof:** The [KL divergence for a continuous random variable](/D/kl) is given by 
+
+$$ \label{eq:KL-cont}
+\mathrm{KL}[P\,||\,Q] = \int_{\mathcal{X}} p(x) \, \ln \frac{p(x)}{q(x)} \, \mathrm{d}x
+$$
+
+which, applied to the [Dirichlet distributions](/D/mvn) in \eqref{eq:dirs}, yields
+
+$$ \label{eq:dir-KL-s1}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \int_{\mathcal{X}^k} \mathrm{Dir}(x; \alpha_1) \, \ln \frac{\mathrm{Dir}(x; \alpha_1)}{\mathrm{Dir}(x; \alpha_2)} \, \mathrm{d}x \\
+&= \left\langle \ln \frac{\mathrm{Dir}(x; \alpha_1)}{\mathrm{Dir}(x; \alpha_2)} \right\rangle_{p(x)}
+\end{split}
+$$
+
+where $\mathcal{X}^k$ is the set $\left\lbrace x \in \mathbb{R}^k \; \vert \; \sum_{i=1}^{k} x_i = 1, \; 0 \leq x_i \leq 1, \; i = 1,\ldots,k \right\rbrace$.
+
+Using the [probability density function of the Dirichlet distribution](/P/dir-pdf), this becomes:
+
+$$ \label{eq:dir-KL-s2}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \left\langle \ln \frac{ \frac{\Gamma\left( \sum_{i=1}^k \alpha_{1i} \right)}{\prod_{i=1}^k \Gamma(\alpha_{1i})} \, \prod_{i=1}^k {x_i}^{\alpha_{1i}-1} }{ \frac{\Gamma\left( \sum_{i=1}^k \alpha_{2i} \right)}{\prod_{i=1}^k \Gamma(\alpha_{2i})} \, \prod_{i=1}^k {x_i}^{\alpha_{2i}-1} } \right\rangle_{p(x)} \\
+&= \left\langle \ln \left( \frac{\Gamma\left( \sum_{i=1}^k \alpha_{1i} \right)}{\Gamma\left( \sum_{i=1}^k \alpha_{2i} \right)} \cdot \frac{\prod_{i=1}^k \Gamma(\alpha_{2i})}{\prod_{i=1}^k \Gamma(\alpha_{1i})} \cdot \prod_{i=1}^k {x_i}^{\alpha_{1i}-\alpha_{2i}} \right) \right\rangle_{p(x)} \\
+&= \left\langle \ln \frac{\Gamma\left( \sum_{i=1}^k \alpha_{1i} \right)}{\Gamma\left( \sum_{i=1}^k \alpha_{2i} \right)} + \sum_{i=1}^k \ln \frac{\Gamma(\alpha_{2i})}{\Gamma(\alpha_{1i})} + \sum_{i=1}^k (\alpha_{1i}-\alpha_{2i}) \cdot \ln (x_i) \right\rangle_{p(x)} \\
+&= \ln \frac{\Gamma\left( \sum_{i=1}^k \alpha_{1i} \right)}{\Gamma\left( \sum_{i=1}^k \alpha_{2i} \right)} + \sum_{i=1}^k \ln \frac{\Gamma(\alpha_{2i})}{\Gamma(\alpha_{1i})} + \sum_{i=1}^k (\alpha_{1i}-\alpha_{2i}) \cdot \left\langle \ln x_i \right\rangle_{p(x)} \; .
+\end{split}
+$$
+
+Using the [expected value of a logarithmized Dirichlet variate](/P/dir-logmean)
+
+$$ \label{eq:dir-logmean}
+x \sim \mathrm{Dir}(\alpha) \quad \Rightarrow \quad \left\langle \ln x_i \right\rangle = \psi(\alpha_i) - \psi\left(\sum_{i=1}^{k} \alpha_i\right) \; ,
+$$
+
+the Kullback-Leibler divergence from \eqref{eq:dir-KL-s2} becomes:
+
+$$ \label{eq:dir-KL-s3}
+\mathrm{KL}[P\,||\,Q] = \ln \frac{\Gamma\left( \sum_{i=1}^k \alpha_{1i} \right)}{\Gamma\left( \sum_{i=1}^k \alpha_{2i} \right)} + \sum_{i=1}^k \ln \frac{\Gamma(\alpha_{2i})}{\Gamma(\alpha_{1i})} + \sum_{i=1}^k (\alpha_{1i}-\alpha_{2i}) \cdot \left[ \psi(\alpha_{1i}) - \psi\left(\sum_{i=1}^{k} \alpha_{1i}\right) \right]
+$$
diff --git a/P/matn-kl.md b/P/matn-kl.md
@@ -0,0 +1,106 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-12-02 20:22:00
+
+title: "Kullback-Leibler divergence for the matrix-normal distribution"
+chapter: "Probability Distributions"
+section: "Matrix-variate continuous distributions"
+topic: "Matrix-normal distribution"
+theorem: "Kullback-Leibler divergence"
+
+sources:
+
+proof_id: "P296"
+shortcut: "matn-kl"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X$ be an $n \times p$ [random matrix](/D/rmat). Assume two [matrix-normal distributions](/D/matn) $P$ and $Q$ specifying the probability distribution of $X$ as
+
+$$ \label{eq:matns}
+\begin{split}
+P: \; X &\sim \mathcal{MN}(M_1, U_1, V_1) \\
+Q: \; X &\sim \mathcal{MN}(M_2, U_2, V_2) \; .
+\end{split}
+$$
+
+Then, the [Kullback-Leibler divergence](/D/kl) of $P$ from $Q$ is given by
+
+$$ \label{eq:matn-KL}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \frac{1}{2} \left[ \mathrm{vec}(M_2 - M_1)^\mathrm{T} \mathrm{vec}\left(U_2^{-1} (M_2 - M_1) V_2^{-1}\right) \right. \\
+&+ \left. \mathrm{tr}\left( (V_2^{-1}V_1) \otimes (U_2^{-1}U_1) \right) - n \ln \frac{|V_1|}{|V_2|} - p \ln \frac{|U_1|}{|U_2|} - n p \right] \; .
+\end{split}
+$$
+
+
+**Proof:** The [matrix-normal distribution is equivalent to the multivariate normal distribution](/P/matn-mvn),
+
+$$ \label{eq:matn-mvn}
+X \sim \mathcal{MN}(M, U, V) \quad \Leftrightarrow \quad \mathrm{vec}(X) \sim \mathcal{N}(\mathrm{vec}(M), V \otimes U) \; ,
+$$
+
+and the [Kullback-Leibler divergence for the multivariate normal distribution](/P/mvn-kl) is
+
+$$ \label{eq:mvn-KL}
+\mathrm{KL}[P\,||\,Q] = \frac{1}{2} \left[ (\mu_2 - \mu_1)^T \Sigma_2^{-1} (\mu_2 - \mu_1) + \mathrm{tr}(\Sigma_2^{-1} \Sigma_1) - \ln \frac{|\Sigma_1|}{|\Sigma_2|} - n \right]
+$$
+
+where $X$ is an $n \times 1$ [random vector](/D/rvec).
+
+Thus, we can plug the distribution parameters from \eqref{eq:matns} into the KL divergence in \eqref{eq:mvn-KL} using the relationship given by \eqref{eq:matn-mvn}
+
+$$ \label{eq:matn-KL-s1}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \frac{1}{2} \left[ (\mathrm{vec}(M_2) - \mathrm{vec}(M_1))^T (V_2 \otimes U_2)^{-1} (\mathrm{vec}(M_2) - \mathrm{vec}(M_1)) \right. \\
+&+ \left. \mathrm{tr}\left( (V_2 \otimes U_2)^{-1} (V_1 \otimes U_1) \right) - \ln \frac{|V_1 \otimes U_1|}{|V_2 \otimes U_2|} - n p \right] \; .
+\end{split}
+$$
+
+Using the vectorization operator and Kronecker product properties
+
+$$ \label{eq:vec-add}
+\mathrm{vec}(A) + \mathrm{vec}(B) = \mathrm{vec}(A+B)
+$$
+
+$$ \label{eq:kron-inv}
+(A \otimes B)^{-1} = A^{-1} \otimes B^{-1}
+$$
+
+$$ \label{eq:kron-prod}
+(A \otimes B) (C \otimes D) = (AC) \otimes (BD)
+$$
+
+$$ \label{eq:kron-det}
+|A \otimes B| = |A|^m \, |B|^n \quad \text{where} \quad A \in \mathbb{R}^{n \times n} \quad \text{and} \quad B \in \mathbb{R}^{m \times m} \; ,
+$$
+
+the Kullback-Leibler divergence from \eqref{eq:matn-KL-s1} becomes:
+
+$$ \label{eq:matn-KL-s2}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \frac{1}{2} \left[ \mathrm{vec}(M_2 - M_1)^\mathrm{T} \, (V_2^{-1} \otimes U_2^{-1}) \, \mathrm{vec}(M_2 - M_1) \right. \\
+&+ \left. \mathrm{tr}\left( (V_2^{-1}V_1) \otimes (U_2^{-1}U_1) \right) - n \ln \frac{|V_1|}{|V_2|} - p \ln \frac{|U_1|}{|U_2|} - n p \right] \; .
+\end{split}
+$$
+
+Using the relationship between Kronecker product and vectorization operator
+
+$$ \label{eq:kron-vec}
+(C^\mathrm{T} \otimes A) \, \mathrm{vec}(B) = \mathrm{vec}(ABC) \; ,
+$$
+
+we finally have:
+
+$$ \label{eq:matn-KL-s3}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \frac{1}{2} \left[ \mathrm{vec}(M_2 - M_1)^\mathrm{T} \mathrm{vec}\left(U_2^{-1} (M_2 - M_1) V_2^{-1}\right) \right. \\
+&+ \left. \mathrm{tr}\left( (V_2^{-1}V_1) \otimes (U_2^{-1}U_1) \right) - n \ln \frac{|V_1|}{|V_2|} - p \ln \frac{|U_1|}{|U_2|} - n p \right] \; .
+\end{split}
+$$
diff --git a/P/wish-kl.md b/P/wish-kl.md
@@ -0,0 +1,121 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-12-02 15:33:00
+
+title: "Kullback-Leibler divergence for the Wishart distribution"
+chapter: "Probability Distributions"
+section: "Matrix-variate continuous distributions"
+topic: "Wishart distribution"
+theorem: "Kullback-Leibler divergence"
+
+sources:
+  - authors: "Penny, William D."
+    year: 2001
+    title: "KL-Divergences of Normal, Gamma, Dirichlet and Wishart densities"
+    in: "University College, London"
+    pages: "pp. 2-3, eqs. 13/15"
+    url: "https://www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps"
+  - authors: "Wikipedia"
+    year: 2021
+    title: "Wishart distribution"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2021-12-02"
+    url: "https://en.wikipedia.org/wiki/Wishart_distribution#KL-divergence"
+
+proof_id: "P295"
+shortcut: "wish-kl"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $S$ be a $p \times p$ [random matrix](/D/rmat). Assume two [Wishart distributions](/D/wish) $P$ and $Q$ specifying the probability distribution of $S$ as
+
+$$ \label{eq:wishs}
+\begin{split}
+P: \; S &\sim \mathcal{W}(V_1, n_1) \\
+Q: \; S &\sim \mathcal{W}(V_2, n_2) \; .
+\end{split}
+$$
+
+Then, the [Kullback-Leibler divergence](/D/kl) of $P$ from $Q$ is given by
+
+$$ \label{eq:wish-KL}
+\mathrm{KL}[P\,||\,Q] = \frac{1}{2} \left[ n_2 \left( \ln |V_2| - \ln |V_1| \right) + n_1 \mathrm{tr}(V_2^{-1} V_1) + 2 \ln \frac{\Gamma_p\left(\frac{n_2}{2}\right)}{\Gamma_p\left(\frac{n_1}{2}\right)} + (n_1-n_2) \psi_p\left(\frac{n_1}{2}\right) - n_1 p \right]
+$$
+
+where $\Gamma_p(x)$ is the multivariate gamma function
+
+$$ \label{eq:mult-gam-fct}
+\Gamma_p(x) = \pi^{p(p-1)/4} \, \prod_{j=1}^k \Gamma\left(x - \frac{j-1}{2}\right)
+$$
+
+and $\psi_p(x)$ is the multivariate digamma function
+
+$$ \label{eq:mult-gam-fct}
+\psi_p(x) = \frac{\mathrm{d}\ln \Gamma_p(x)}{\mathrm{d}x} = \sum_{j=1}^k \psi\left(x - \frac{j-1}{2}\right) \; .
+$$
+
+
+**Proof:** The [KL divergence for a continuous random variable](/D/kl) is given by 
+
+$$ \label{eq:KL-cont}
+\mathrm{KL}[P\,||\,Q] = \int_{\mathcal{X}} p(x) \, \ln \frac{p(x)}{q(x)} \, \mathrm{d}x
+$$
+
+which, applied to the [Wishart distributions](/D/wish) in \eqref{eq:wishs}, yields
+
+$$ \label{eq:wish-KL-s1}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \int_{\mathcal{S}^p} \mathcal{W}(S; V_1, n_1) \, \ln \frac{\mathcal{W}(S; V_1, n_1)}{\mathcal{W}(S; V_2, n_2)} \, \mathrm{d}S \\
+&= \left\langle \ln \frac{\mathcal{W}(S; \alpha_1)}{\mathcal{W}(S; \alpha_1)} \right\rangle_{p(S)}
+\end{split}
+$$
+
+where $\mathcal{S}^p$ is the set of all positive-definite symmetric $p \times p$ matrices.
+
+Using the [probability density function of the Wishart distribution](/P/wish-pdf), this becomes:
+
+$$ \label{eq:wish-KL-s2}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \left\langle \ln \frac{\frac{1}{\sqrt{2^{n_1 p} |V_1|^{n_1}} \Gamma_p \left( \frac{n_1}{2} \right)} \cdot |S|^{(n_1-p-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V_1^{-1} S \right) \right]}{\frac{1}{\sqrt{2^{n_2 p} |V_2|^{n_2}} \Gamma_p \left( \frac{n_2}{2} \right)} \cdot |S|^{(n_2-p-1)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V_2^{-1} S \right) \right]} \right\rangle_{p(S)} \\
+&= \left\langle \ln \left( \sqrt{2^{(n_2-n_1)p} \cdot \frac{|V_2|^{n_2}}{|V_1|^{n_1}}} \cdot \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} \cdot |S|^{(n_1-n_2)/2} \cdot \exp\left[ -\frac{1}{2} \mathrm{tr}\left( V_1^{-1} S \right) -\frac{1}{2} \mathrm{tr}\left( V_2^{-1} S \right) \right] \right) \right\rangle_{p(S)} \\
+&= \left\langle \frac{(n_2-n_1)p}{2} \ln 2 + \frac{n_2}{2} \ln |V_2| - \frac{n_1}{2} \ln |V_1| + \ln \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} \right. \\
+&+ \left. \quad \frac{n_1-n_2}{2} \ln |S| - \frac{1}{2} \mathrm{tr}\left( V_1^{-1} S \right) - \frac{1}{2} \mathrm{tr}\left( V_2^{-1} S \right) \right\rangle_{p(S)} \\
+&= \frac{(n_2-n_1)p}{2} \ln 2 + \frac{n_2}{2} \ln |V_2| - \frac{n_1}{2} \ln |V_1| + \ln \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} \\
+&+ \frac{n_1-n_2}{2} \left\langle \ln |S| \right\rangle_{p(S)} - \frac{1}{2} \left\langle \mathrm{tr}\left( V_1^{-1} S \right) \right\rangle_{p(S)} - \frac{1}{2} \left\langle \mathrm{tr}\left( V_2^{-1} S \right) \right\rangle_{p(S)} \; .
+\end{split}
+$$
+
+Using the [expected value of a Wishart random matrix](/P/wish-mean)
+
+$$ \label{eq:wish-mean}
+S \sim \mathcal{W}(V,n) \quad \Rightarrow \quad \left\langle S \right\rangle = n V \; ,
+$$
+
+such that the [expected value of the matrix trace](/P/mean-tr) becomes
+
+$$ \label{eq:wish-trmean}
+\left\langle \mathrm{tr}(AS) \right\rangle = \mathrm{tr}\left( \left\langle AS \right\rangle \right) = \mathrm{tr}\left( A \left\langle S \right\rangle \right) = \mathrm{tr}\left( A \cdot (nV) \right) = n \cdot \mathrm{tr}(AV) \; ,
+$$
+
+and the [expected value of a Wishart log-determinant](/P/wish-logdetmean)
+
+$$ \label{eq:wish-logdetmean}
+S \sim \mathcal{W}(V,n) \quad \Rightarrow \quad \left\langle \ln |S| \right\rangle = \psi_p\left(\frac{n}{2}\right) + p \cdot \ln 2 + \ln |V| \; ,
+$$
+
+the Kullback-Leibler divergence from \eqref{eq:wish-KL-s2} becomes:
+
+$$ \label{eq:wish-KL-s3}
+\begin{split}
+\mathrm{KL}[P\,||\,Q] &= \frac{(n_2-n_1)p}{2} \ln 2 + \frac{n_2}{2} \ln |V_2| - \frac{n_1}{2} \ln |V_1| + \ln \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} \\
+&+ \frac{n_1-n_2}{2} \left[ \psi_p\left(\frac{n_1}{2}\right) + p \cdot \ln 2 + \ln |V_1| \right] - \frac{n_1}{2} \mathrm{tr}\left( V_1^{-1} V_1 \right) - \frac{n_1}{2} \mathrm{tr}\left( V_2^{-1} V_1 \right) \\
+&= \frac{n_2}{2} \left( \ln |V_2| - \ln |V_1| \right) + \ln \frac{\Gamma_p\left( \frac{n_2}{2} \right)}{\Gamma_p\left( \frac{n_1}{2} \right)} + \frac{n_1-n_2}{2} \psi_p\left(\frac{n_1}{2}\right) - \frac{n_1}{2} \mathrm{tr}\left( I_p \right) - \frac{n_1}{2} \mathrm{tr}\left( V_2^{-1} V_1 \right) \\
+& = \frac{1}{2} \left[ n_2 \left( \ln |V_2| - \ln |V_1| \right) + n_1 \mathrm{tr}(V_2^{-1} V_1) + 2 \ln \frac{\Gamma_p\left(\frac{n_2}{2}\right)}{\Gamma_p\left(\frac{n_1}{2}\right)} + (n_1-n_2) \psi_p\left(\frac{n_1}{2}\right) - n_1 p \right] \; .
+\end{split}
+$$