added 5 proofs

JoramSoch · web-flow · commit 411ea70676ac · 2022-09-09T17:02:20.000+02:00
diff --git a/P/bern-ent.md b/P/bern-ent.md
@@ -0,0 +1,62 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-09-02 12:21:00
+
+title: "Entropy of the Bernoulli distribution"
+chapter: "Probability Distributions"
+section: "Univariate discrete distributions"
+topic: "Bernoulli distribution"
+theorem: "Shannon entropy"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2022
+    title: "Bernoulli distribution"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2022-09-02"
+    url: "https://en.wikipedia.org/wiki/Bernoulli_distribution"
+  - authors: "Wikipedia"
+    year: 2022
+    title: "Binary entropy function"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2022-09-02"
+    url: "https://en.wikipedia.org/wiki/Binary_entropy_function"
+
+proof_id: "P334"
+shortcut: "bern-ent"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X$ be a [random variable](/D/rvar) following a [Bernoulli distribution](/D/bern):
+
+$$ \label{eq:bern}
+X \sim \mathrm{Bern}(p) \; .
+$$
+
+Then, the [(Shannon) entropy](/D/ent) of $X$ in bits is
+
+$$ \label{eq:bern-ent}
+\mathrm{H}(X) = -p \log_2 p - (1-p) \log_2 (1-p) \; .
+$$
+
+
+**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values:
+
+$$ \label{eq:ent}
+\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .
+$$
+
+Entropy is measured in bits by setting $b = 2$. Since there are only [two possible outcomes for a Bernoulli random variable](/P/bern-pmf), we have:
+
+$$ \label{eq:bern-ent-qed}
+\begin{split}
+\mathrm{H}(X) &= - \mathrm{Pr}(X = 0) \cdot \log_2 \mathrm{Pr}(X = 0) - \mathrm{Pr}(X = 1) \cdot \log_2 \mathrm{Pr}(X = 1) \\
+&= -p \log_2 p - (1-p) \log_2 (1-p) \; . \\
+\end{split}
+$$
diff --git a/P/bin-ent.md b/P/bin-ent.md
@@ -0,0 +1,90 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-09-02 13:52:00
+
+title: "Entropy of the binomial distribution"
+chapter: "Probability Distributions"
+section: "Univariate discrete distributions"
+topic: "Binomial distribution"
+theorem: "Shannon entropy"
+
+sources:
+
+proof_id: "P335"
+shortcut: "bin-ent"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X$ be a [random variable](/D/rvar) following a [binomial distribution](/D/bin):
+
+$$ \label{eq:bin}
+X \sim \mathrm{Bin}(n,p) \; .
+$$
+
+Then, the [(Shannon) entropy](/D/ent) of $X$ in bits is
+
+$$ \label{eq:bin-ent}
+\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{bern}(p) - \mathrm{E}_\mathrm{lbc}(n,p)
+$$
+
+where $\mathrm{H}_\mathrm{bern}(p)$ is the binary entropy function, i.e. the [(Shannon) entropy of the Bernoulli distribution](/P/bern-ent) with success probability $p$
+
+$$ \label{eq:H-bern}
+\mathrm{H}_\mathrm{bern}(p) = - p \cdot \log_2 p - (1-p) \log_2 (1-p)
+$$
+
+and $\mathrm{E}_\mathrm{lbc}(n,p)$ is the [expected value](/D/mean) of the logarithmized [binomial coefficient](/P/bin-pmf) with superset size $n$
+
+$$ \label{eq:E-lbf}
+\mathrm{E}_\mathrm{lbc}(n,p) = \mathrm{E}\left[ \log_2 {n \choose X} \right] \quad \text{where} \quad X \sim \mathrm{Bin}(n,p) \; .
+$$
+
+
+**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values:
+
+$$ \label{eq:ent}
+\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .
+$$
+
+Entropy is measured in bits by setting $b = 2$. Then, with the [probability mass function of the binomial distribution](/P/bin-pmf), we have:
+
+$$ \label{eq:bin-ent-s1}
+\begin{split}
+\mathrm{H}(X) &= - \sum_{x \in \mathcal{X}} f_X(x) \cdot \log_2 f_X(x) \\
+&= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \log_2 \left[ {n \choose x} \, p^x \, (1-p)^{n-x} \right] \\
+&= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \left[ \log_2 {n \choose x} + x \cdot \log_2 p + (n-x) \cdot \log_2 (1-p) \right] \\
+&= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \left[ \log_2 {n \choose x} + x \cdot \log_2 p + n \cdot \log_2 (1-p) - x \cdot \log_2 (1-p) \right] \; .
+\end{split}
+$$
+
+Since the first factor in the sum corresponds to the [probability mass](/D/pmf) of $X=x$, we can rewrite this as the sum of the [expected values](/D/mean) [of the functions](/P/mean-lotus) of the [discrete random variable](/D/rvar-disc) $x$ in the square bracket:
+
+$$ \label{eq:bin-ent-s2}
+\begin{split}
+\mathrm{H}(X) &= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - \left\langle x \cdot \log_2 p \right\rangle_{p(x)} - \left\langle n \cdot \log_2 (1-p) \right\rangle_{p(x)} + \left\langle x \cdot \log_2 (1-p) \right\rangle_{p(x)} \\
+&= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - \log_2 p \cdot \left\langle x \right\rangle_{p(x)} - n \cdot \log_2 (1-p) +  \log_2 (1-p) \cdot \left\langle x \right\rangle_{p(x)} \; .
+\end{split}
+$$
+
+Using the [expected value of the binomial distribution](/P/bin-mean), i.e. $X \sim \mathrm{Bin}(n,p) \Rightarrow \left\langle x \right\rangle = n p$, this gives:
+
+$$ \label{eq:bin-ent-s3}
+\begin{split}
+\mathrm{H}(X) &= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - n p \cdot \log_2 p - n \cdot \log_2 (1-p) +  n p \cdot \log_2 (1-p) \\
+&= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} + n \left[ - p \cdot \log_2 p - (1-p) \log_2 (1-p) \right] \; .
+\end{split}
+$$
+
+Finally, we note that the first term is the negative [expected value](/D/mean) of the logarithm of a [binomial coefficient](/P/bin-pmf) and that the term in square brackets is the [entropy of the Bernoulli distribution](/P/bin-ent), such that we finally get:
+
+$$ \label{eq:bin-ent-s4}
+\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{bern}(p) - \mathrm{E}_\mathrm{lbc}(n,p) \; .
+$$
+
+Note that, because $0 \leq \mathrm{H}_\mathrm{bern}(p) \leq 1$, we have $0 \leq n \cdot \mathrm{H}_\mathrm{bern}(p) \leq n$, and because the [entropy is non-negative](/P/ent-nonneg), it must hold that $n \geq \mathrm{E}_\mathrm{lbc}(n,p) \geq 0$.
diff --git a/P/cat-cov.md b/P/cat-cov.md
@@ -0,0 +1,53 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-09-09 16:57:00
+
+title: "Covariance matrix of the categorical distribution"
+chapter: "Probability Distributions"
+section: "Multivariate discrete distributions"
+topic: "Categorical distribution"
+theorem: "Covariance"
+
+sources:
+
+proof_id: "P338"
+shortcut: "cat-cov"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X$ be a [random vector](/D/rvec) following a [categorical distribution](/D/cat):
+
+$$ \label{eq:cat}
+X \sim \mathrm{Cat}(n,p) \; .
+$$
+
+Then, the [covariance matrix](/D/covmat) of $X$ is
+
+$$ \label{eq:cat-cov}
+\mathrm{Cov}(X) = \mathrm{diag}(p) - pp^\mathrm{T} \; .
+$$
+
+
+**Proof:** The [categorical distribution](/D/cat) is a special case of the [multinomial distribution](/D/mult) in which $n = 1$:
+
+$$ \label{eq:cat-mult}
+X \sim \mathrm{Mult}(n,p) \quad \text{and} \quad n = 1 \quad \Rightarrow \quad X \sim \mathrm{Cat}(p) \; .
+$$
+
+The [covariance matrix of the multinomial distribution](/P/mult-cov) is
+
+$$ \label{eq:mult-cov}
+\mathrm{Cov}(X) = n \left(\mathrm{diag}(p) - pp^\mathrm{T} \right) \; ,
+$$
+
+thus the covariance matrix of the categorical distribution is
+
+$$ \label{eq:cat-cov-qed}
+\mathrm{Cov}(X) = \mathrm{diag}(p) - pp^\mathrm{T} \; .
+$$
diff --git a/P/cat-ent.md b/P/cat-ent.md
@@ -0,0 +1,51 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-09-09 15:41:00
+
+title: "Entropy of the categorical distribution"
+chapter: "Probability Distributions"
+section: "Multivariate discrete distributions"
+topic: "Categorical distribution"
+theorem: "Shannon entropy"
+
+sources:
+
+proof_id: "P336"
+shortcut: "cat-ent"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X$ be a [random vector](/D/rvec) following a [categorical distribution](/D/cat):
+
+$$ \label{eq:cat}
+X \sim \mathrm{Cat}(p) \; .
+$$
+
+Then, the [(Shannon) entropy](/D/ent) of $X$ is
+
+$$ \label{eq:cat-ent}
+\mathrm{H}(X) = - \sum_{i=1}^{k} p_i \cdot \log p_i \; .
+$$
+
+
+**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values:
+
+$$ \label{eq:ent}
+\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .
+$$
+
+Since there are $k$ [possible values for a categorical random vector](/D/cat) with [probabilities given by the entries](/P/cat-pmf) of the $1 \times k$ vector $p$, we have:
+
+$$ \label{eq:cat-ent-qed}
+\begin{split}
+\mathrm{H}(X) &= - \mathrm{Pr}(X = e_1) \cdot \log \mathrm{Pr}(X = e_1) - \ldots - \mathrm{Pr}(X = e_k) \cdot \log \mathrm{Pr}(X = e_k) \\
+\mathrm{H}(X) &= - \sum_{i=1}^{k} \mathrm{Pr}(X = e_i) \cdot \log \mathrm{Pr}(X = e_i) \\
+\mathrm{H}(X) &= - \sum_{i=1}^{k} p_i \cdot \log p_i \; . \\
+\end{split}
+$$
diff --git a/P/mult-ent.md b/P/mult-ent.md
@@ -0,0 +1,93 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-09-09 16:33:00
+
+title: "Entropy of the multinomial distribution"
+chapter: "Probability Distributions"
+section: "Multivariate discrete distributions"
+topic: "Multinomial distribution"
+theorem: "Shannon entropy"
+
+sources:
+
+proof_id: "P337"
+shortcut: "mult-ent"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X$ be a [random vector](/D/rvec) following a [multinomial distribution](/D/mult):
+
+$$ \label{eq:mult}
+X \sim \mathrm{Mult}(n,p) \; .
+$$
+
+Then, the [(Shannon) entropy](/D/ent) of $X$ is
+
+$$ \label{eq:mult-ent}
+\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{cat}(p) - \mathrm{E}_\mathrm{lmc}(n,p)
+$$
+
+where $\mathrm{H}_\mathrm{cat}(p)$ is the categorical entropy function, i.e. the [(Shannon) entropy of the categorical distribution](/P/cat-ent) with category probabilities $p$
+
+$$ \label{eq:H-cat}
+\mathrm{H}_\mathrm{cat}(p) = - \sum_{i=1}^{k} p_i \cdot \log p_i
+$$
+
+and $\mathrm{E}_\mathrm{lmc}(n,p)$ is the [expected value](/D/mean) of the logarithmized [multinomial coefficient](/P/mult-pmf) with superset size $n$
+
+$$ \label{eq:E-lmf}
+\mathrm{E}_\mathrm{lmf}(n,p) = \mathrm{E}\left[ \log {n \choose {X_1, \ldots, X_k}} \right] \quad \text{where} \quad X \sim \mathrm{Mult}(n,p) \; .
+$$
+
+
+**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values:
+
+$$ \label{eq:ent}
+\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .
+$$
+
+The [probability mass function of the multinomial distribution](/P/mult-pmf) is
+
+$$ \label{eq:mult-pmf}
+f_X(x) = {n \choose {x_1, \ldots, x_k}} \, \prod_{i=1}^k {p_i}^{x_i}
+$$
+
+Let $\mathcal{X}_{n,k}$ be the set of all vectors $x \in \mathbb{N}^{1 \times k}$ satisfying $\sum_{i=1}^{k} x_i = n$. Then, we have:
+
+$$ \label{eq:mult-ent-s1}
+\begin{split}
+\mathrm{H}(X) &= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \log f_X(x) \\
+&= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \log \left[ {n \choose {x_1, \ldots, x_k}} \, \prod_{i=1}^k {p_i}^{x_i} \right] \\
+&= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \left[ \log {n \choose {x_1, \ldots, x_k}} + \sum_{i=1}^{k} x_i \cdot \log p_i \right] \; .
+\end{split}
+$$
+
+Since the first factor in the sum corresponds to the [probability mass](/D/pmf) of $X=x$, we can rewrite this as the sum of the [expected values](/D/mean) [of the functions](/P/mean-lotus) of the [discrete random variable](/D/rvar-disc) $x$ in the square bracket:
+
+$$ \label{eq:mult-ent-s2}
+\begin{split}
+\mathrm{H}(X) &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \left\langle \sum_{i=1}^{k} x_i \cdot \log p_i \right\rangle_{p(x)} \\
+&= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \sum_{i=1}^{k} \left\langle x_i \cdot \log p_i \right\rangle_{p(x)} \; .
+\end{split}
+$$
+
+Using the [expected value of the multinomial distribution](/P/mult-mean), i.e. $X \sim \mathrm{Mult}(n,p) \Rightarrow \left\langle x_i \right\rangle = n p_i$, this gives:
+
+$$ \label{eq:mult-ent-s3}
+\begin{split}
+\mathrm{H}(X) &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \sum_{i=1}^{k} n p_i \cdot \log p_i \\
+&= - \left\langle\log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - n \sum_{i=1}^{k} p_i \cdot \log p_i \; .
+\end{split}
+$$
+
+Finally, we note that the first term is the negative [expected value](/D/mean) of the logarithm of a [multinomial coefficient](/P/mult-pmf) and that the second term is the [entropy of the categorical distribution](/P/cat-ent), such that we finally get:
+
+$$ \label{eq:mult-ent-s4}
+\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{cat}(p) - \mathrm{E}_\mathrm{lmc}(n,p) \; .
+$$