Skip to content

Commit 411ea70

Browse files
authored
added 5 proofs
1 parent d101043 commit 411ea70

5 files changed

Lines changed: 349 additions & 0 deletions

File tree

P/bern-ent.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-09-02 12:21:00
9+
10+
title: "Entropy of the Bernoulli distribution"
11+
chapter: "Probability Distributions"
12+
section: "Univariate discrete distributions"
13+
topic: "Bernoulli distribution"
14+
theorem: "Shannon entropy"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2022
19+
title: "Bernoulli distribution"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2022-09-02"
22+
url: "https://en.wikipedia.org/wiki/Bernoulli_distribution"
23+
- authors: "Wikipedia"
24+
year: 2022
25+
title: "Binary entropy function"
26+
in: "Wikipedia, the free encyclopedia"
27+
pages: "retrieved on 2022-09-02"
28+
url: "https://en.wikipedia.org/wiki/Binary_entropy_function"
29+
30+
proof_id: "P334"
31+
shortcut: "bern-ent"
32+
username: "JoramSoch"
33+
---
34+
35+
36+
**Theorem:** Let $X$ be a [random variable](/D/rvar) following a [Bernoulli distribution](/D/bern):
37+
38+
$$ \label{eq:bern}
39+
X \sim \mathrm{Bern}(p) \; .
40+
$$
41+
42+
Then, the [(Shannon) entropy](/D/ent) of $X$ in bits is
43+
44+
$$ \label{eq:bern-ent}
45+
\mathrm{H}(X) = -p \log_2 p - (1-p) \log_2 (1-p) \; .
46+
$$
47+
48+
49+
**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values:
50+
51+
$$ \label{eq:ent}
52+
\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .
53+
$$
54+
55+
Entropy is measured in bits by setting $b = 2$. Since there are only [two possible outcomes for a Bernoulli random variable](/P/bern-pmf), we have:
56+
57+
$$ \label{eq:bern-ent-qed}
58+
\begin{split}
59+
\mathrm{H}(X) &= - \mathrm{Pr}(X = 0) \cdot \log_2 \mathrm{Pr}(X = 0) - \mathrm{Pr}(X = 1) \cdot \log_2 \mathrm{Pr}(X = 1) \\
60+
&= -p \log_2 p - (1-p) \log_2 (1-p) \; . \\
61+
\end{split}
62+
$$

P/bin-ent.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-09-02 13:52:00
9+
10+
title: "Entropy of the binomial distribution"
11+
chapter: "Probability Distributions"
12+
section: "Univariate discrete distributions"
13+
topic: "Binomial distribution"
14+
theorem: "Shannon entropy"
15+
16+
sources:
17+
18+
proof_id: "P335"
19+
shortcut: "bin-ent"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Let $X$ be a [random variable](/D/rvar) following a [binomial distribution](/D/bin):
25+
26+
$$ \label{eq:bin}
27+
X \sim \mathrm{Bin}(n,p) \; .
28+
$$
29+
30+
Then, the [(Shannon) entropy](/D/ent) of $X$ in bits is
31+
32+
$$ \label{eq:bin-ent}
33+
\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{bern}(p) - \mathrm{E}_\mathrm{lbc}(n,p)
34+
$$
35+
36+
where $\mathrm{H}_\mathrm{bern}(p)$ is the binary entropy function, i.e. the [(Shannon) entropy of the Bernoulli distribution](/P/bern-ent) with success probability $p$
37+
38+
$$ \label{eq:H-bern}
39+
\mathrm{H}_\mathrm{bern}(p) = - p \cdot \log_2 p - (1-p) \log_2 (1-p)
40+
$$
41+
42+
and $\mathrm{E}_\mathrm{lbc}(n,p)$ is the [expected value](/D/mean) of the logarithmized [binomial coefficient](/P/bin-pmf) with superset size $n$
43+
44+
$$ \label{eq:E-lbf}
45+
\mathrm{E}_\mathrm{lbc}(n,p) = \mathrm{E}\left[ \log_2 {n \choose X} \right] \quad \text{where} \quad X \sim \mathrm{Bin}(n,p) \; .
46+
$$
47+
48+
49+
**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values:
50+
51+
$$ \label{eq:ent}
52+
\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .
53+
$$
54+
55+
Entropy is measured in bits by setting $b = 2$. Then, with the [probability mass function of the binomial distribution](/P/bin-pmf), we have:
56+
57+
$$ \label{eq:bin-ent-s1}
58+
\begin{split}
59+
\mathrm{H}(X) &= - \sum_{x \in \mathcal{X}} f_X(x) \cdot \log_2 f_X(x) \\
60+
&= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \log_2 \left[ {n \choose x} \, p^x \, (1-p)^{n-x} \right] \\
61+
&= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \left[ \log_2 {n \choose x} + x \cdot \log_2 p + (n-x) \cdot \log_2 (1-p) \right] \\
62+
&= - \sum_{x=0}^{n} {n \choose x} \, p^x \, (1-p)^{n-x} \cdot \left[ \log_2 {n \choose x} + x \cdot \log_2 p + n \cdot \log_2 (1-p) - x \cdot \log_2 (1-p) \right] \; .
63+
\end{split}
64+
$$
65+
66+
Since the first factor in the sum corresponds to the [probability mass](/D/pmf) of $X=x$, we can rewrite this as the sum of the [expected values](/D/mean) [of the functions](/P/mean-lotus) of the [discrete random variable](/D/rvar-disc) $x$ in the square bracket:
67+
68+
$$ \label{eq:bin-ent-s2}
69+
\begin{split}
70+
\mathrm{H}(X) &= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - \left\langle x \cdot \log_2 p \right\rangle_{p(x)} - \left\langle n \cdot \log_2 (1-p) \right\rangle_{p(x)} + \left\langle x \cdot \log_2 (1-p) \right\rangle_{p(x)} \\
71+
&= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - \log_2 p \cdot \left\langle x \right\rangle_{p(x)} - n \cdot \log_2 (1-p) + \log_2 (1-p) \cdot \left\langle x \right\rangle_{p(x)} \; .
72+
\end{split}
73+
$$
74+
75+
Using the [expected value of the binomial distribution](/P/bin-mean), i.e. $X \sim \mathrm{Bin}(n,p) \Rightarrow \left\langle x \right\rangle = n p$, this gives:
76+
77+
$$ \label{eq:bin-ent-s3}
78+
\begin{split}
79+
\mathrm{H}(X) &= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} - n p \cdot \log_2 p - n \cdot \log_2 (1-p) + n p \cdot \log_2 (1-p) \\
80+
&= - \left\langle \log_2 {n \choose x} \right\rangle_{p(x)} + n \left[ - p \cdot \log_2 p - (1-p) \log_2 (1-p) \right] \; .
81+
\end{split}
82+
$$
83+
84+
Finally, we note that the first term is the negative [expected value](/D/mean) of the logarithm of a [binomial coefficient](/P/bin-pmf) and that the term in square brackets is the [entropy of the Bernoulli distribution](/P/bin-ent), such that we finally get:
85+
86+
$$ \label{eq:bin-ent-s4}
87+
\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{bern}(p) - \mathrm{E}_\mathrm{lbc}(n,p) \; .
88+
$$
89+
90+
Note that, because $0 \leq \mathrm{H}_\mathrm{bern}(p) \leq 1$, we have $0 \leq n \cdot \mathrm{H}_\mathrm{bern}(p) \leq n$, and because the [entropy is non-negative](/P/ent-nonneg), it must hold that $n \geq \mathrm{E}_\mathrm{lbc}(n,p) \geq 0$.

P/cat-cov.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-09-09 16:57:00
9+
10+
title: "Covariance matrix of the categorical distribution"
11+
chapter: "Probability Distributions"
12+
section: "Multivariate discrete distributions"
13+
topic: "Categorical distribution"
14+
theorem: "Covariance"
15+
16+
sources:
17+
18+
proof_id: "P338"
19+
shortcut: "cat-cov"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Let $X$ be a [random vector](/D/rvec) following a [categorical distribution](/D/cat):
25+
26+
$$ \label{eq:cat}
27+
X \sim \mathrm{Cat}(n,p) \; .
28+
$$
29+
30+
Then, the [covariance matrix](/D/covmat) of $X$ is
31+
32+
$$ \label{eq:cat-cov}
33+
\mathrm{Cov}(X) = \mathrm{diag}(p) - pp^\mathrm{T} \; .
34+
$$
35+
36+
37+
**Proof:** The [categorical distribution](/D/cat) is a special case of the [multinomial distribution](/D/mult) in which $n = 1$:
38+
39+
$$ \label{eq:cat-mult}
40+
X \sim \mathrm{Mult}(n,p) \quad \text{and} \quad n = 1 \quad \Rightarrow \quad X \sim \mathrm{Cat}(p) \; .
41+
$$
42+
43+
The [covariance matrix of the multinomial distribution](/P/mult-cov) is
44+
45+
$$ \label{eq:mult-cov}
46+
\mathrm{Cov}(X) = n \left(\mathrm{diag}(p) - pp^\mathrm{T} \right) \; ,
47+
$$
48+
49+
thus the covariance matrix of the categorical distribution is
50+
51+
$$ \label{eq:cat-cov-qed}
52+
\mathrm{Cov}(X) = \mathrm{diag}(p) - pp^\mathrm{T} \; .
53+
$$

P/cat-ent.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-09-09 15:41:00
9+
10+
title: "Entropy of the categorical distribution"
11+
chapter: "Probability Distributions"
12+
section: "Multivariate discrete distributions"
13+
topic: "Categorical distribution"
14+
theorem: "Shannon entropy"
15+
16+
sources:
17+
18+
proof_id: "P336"
19+
shortcut: "cat-ent"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Let $X$ be a [random vector](/D/rvec) following a [categorical distribution](/D/cat):
25+
26+
$$ \label{eq:cat}
27+
X \sim \mathrm{Cat}(p) \; .
28+
$$
29+
30+
Then, the [(Shannon) entropy](/D/ent) of $X$ is
31+
32+
$$ \label{eq:cat-ent}
33+
\mathrm{H}(X) = - \sum_{i=1}^{k} p_i \cdot \log p_i \; .
34+
$$
35+
36+
37+
**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values:
38+
39+
$$ \label{eq:ent}
40+
\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .
41+
$$
42+
43+
Since there are $k$ [possible values for a categorical random vector](/D/cat) with [probabilities given by the entries](/P/cat-pmf) of the $1 \times k$ vector $p$, we have:
44+
45+
$$ \label{eq:cat-ent-qed}
46+
\begin{split}
47+
\mathrm{H}(X) &= - \mathrm{Pr}(X = e_1) \cdot \log \mathrm{Pr}(X = e_1) - \ldots - \mathrm{Pr}(X = e_k) \cdot \log \mathrm{Pr}(X = e_k) \\
48+
\mathrm{H}(X) &= - \sum_{i=1}^{k} \mathrm{Pr}(X = e_i) \cdot \log \mathrm{Pr}(X = e_i) \\
49+
\mathrm{H}(X) &= - \sum_{i=1}^{k} p_i \cdot \log p_i \; . \\
50+
\end{split}
51+
$$

P/mult-ent.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-09-09 16:33:00
9+
10+
title: "Entropy of the multinomial distribution"
11+
chapter: "Probability Distributions"
12+
section: "Multivariate discrete distributions"
13+
topic: "Multinomial distribution"
14+
theorem: "Shannon entropy"
15+
16+
sources:
17+
18+
proof_id: "P337"
19+
shortcut: "mult-ent"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Let $X$ be a [random vector](/D/rvec) following a [multinomial distribution](/D/mult):
25+
26+
$$ \label{eq:mult}
27+
X \sim \mathrm{Mult}(n,p) \; .
28+
$$
29+
30+
Then, the [(Shannon) entropy](/D/ent) of $X$ is
31+
32+
$$ \label{eq:mult-ent}
33+
\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{cat}(p) - \mathrm{E}_\mathrm{lmc}(n,p)
34+
$$
35+
36+
where $\mathrm{H}_\mathrm{cat}(p)$ is the categorical entropy function, i.e. the [(Shannon) entropy of the categorical distribution](/P/cat-ent) with category probabilities $p$
37+
38+
$$ \label{eq:H-cat}
39+
\mathrm{H}_\mathrm{cat}(p) = - \sum_{i=1}^{k} p_i \cdot \log p_i
40+
$$
41+
42+
and $\mathrm{E}_\mathrm{lmc}(n,p)$ is the [expected value](/D/mean) of the logarithmized [multinomial coefficient](/P/mult-pmf) with superset size $n$
43+
44+
$$ \label{eq:E-lmf}
45+
\mathrm{E}_\mathrm{lmf}(n,p) = \mathrm{E}\left[ \log {n \choose {X_1, \ldots, X_k}} \right] \quad \text{where} \quad X \sim \mathrm{Mult}(n,p) \; .
46+
$$
47+
48+
49+
**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values:
50+
51+
$$ \label{eq:ent}
52+
\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; .
53+
$$
54+
55+
The [probability mass function of the multinomial distribution](/P/mult-pmf) is
56+
57+
$$ \label{eq:mult-pmf}
58+
f_X(x) = {n \choose {x_1, \ldots, x_k}} \, \prod_{i=1}^k {p_i}^{x_i}
59+
$$
60+
61+
Let $\mathcal{X}_{n,k}$ be the set of all vectors $x \in \mathbb{N}^{1 \times k}$ satisfying $\sum_{i=1}^{k} x_i = n$. Then, we have:
62+
63+
$$ \label{eq:mult-ent-s1}
64+
\begin{split}
65+
\mathrm{H}(X) &= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \log f_X(x) \\
66+
&= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \log \left[ {n \choose {x_1, \ldots, x_k}} \, \prod_{i=1}^k {p_i}^{x_i} \right] \\
67+
&= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \left[ \log {n \choose {x_1, \ldots, x_k}} + \sum_{i=1}^{k} x_i \cdot \log p_i \right] \; .
68+
\end{split}
69+
$$
70+
71+
Since the first factor in the sum corresponds to the [probability mass](/D/pmf) of $X=x$, we can rewrite this as the sum of the [expected values](/D/mean) [of the functions](/P/mean-lotus) of the [discrete random variable](/D/rvar-disc) $x$ in the square bracket:
72+
73+
$$ \label{eq:mult-ent-s2}
74+
\begin{split}
75+
\mathrm{H}(X) &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \left\langle \sum_{i=1}^{k} x_i \cdot \log p_i \right\rangle_{p(x)} \\
76+
&= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \sum_{i=1}^{k} \left\langle x_i \cdot \log p_i \right\rangle_{p(x)} \; .
77+
\end{split}
78+
$$
79+
80+
Using the [expected value of the multinomial distribution](/P/mult-mean), i.e. $X \sim \mathrm{Mult}(n,p) \Rightarrow \left\langle x_i \right\rangle = n p_i$, this gives:
81+
82+
$$ \label{eq:mult-ent-s3}
83+
\begin{split}
84+
\mathrm{H}(X) &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \sum_{i=1}^{k} n p_i \cdot \log p_i \\
85+
&= - \left\langle\log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - n \sum_{i=1}^{k} p_i \cdot \log p_i \; .
86+
\end{split}
87+
$$
88+
89+
Finally, we note that the first term is the negative [expected value](/D/mean) of the logarithm of a [multinomial coefficient](/P/mult-pmf) and that the second term is the [entropy of the categorical distribution](/P/cat-ent), such that we finally get:
90+
91+
$$ \label{eq:mult-ent-s4}
92+
\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{cat}(p) - \mathrm{E}_\mathrm{lmc}(n,p) \; .
93+
$$

0 commit comments

Comments
 (0)