|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2022-09-09 16:33:00 |
| 9 | + |
| 10 | +title: "Entropy of the multinomial distribution" |
| 11 | +chapter: "Probability Distributions" |
| 12 | +section: "Multivariate discrete distributions" |
| 13 | +topic: "Multinomial distribution" |
| 14 | +theorem: "Shannon entropy" |
| 15 | + |
| 16 | +sources: |
| 17 | + |
| 18 | +proof_id: "P337" |
| 19 | +shortcut: "mult-ent" |
| 20 | +username: "JoramSoch" |
| 21 | +--- |
| 22 | + |
| 23 | + |
| 24 | +**Theorem:** Let $X$ be a [random vector](/D/rvec) following a [multinomial distribution](/D/mult): |
| 25 | + |
| 26 | +$$ \label{eq:mult} |
| 27 | +X \sim \mathrm{Mult}(n,p) \; . |
| 28 | +$$ |
| 29 | + |
| 30 | +Then, the [(Shannon) entropy](/D/ent) of $X$ is |
| 31 | + |
| 32 | +$$ \label{eq:mult-ent} |
| 33 | +\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{cat}(p) - \mathrm{E}_\mathrm{lmc}(n,p) |
| 34 | +$$ |
| 35 | + |
| 36 | +where $\mathrm{H}_\mathrm{cat}(p)$ is the categorical entropy function, i.e. the [(Shannon) entropy of the categorical distribution](/P/cat-ent) with category probabilities $p$ |
| 37 | + |
| 38 | +$$ \label{eq:H-cat} |
| 39 | +\mathrm{H}_\mathrm{cat}(p) = - \sum_{i=1}^{k} p_i \cdot \log p_i |
| 40 | +$$ |
| 41 | + |
| 42 | +and $\mathrm{E}_\mathrm{lmc}(n,p)$ is the [expected value](/D/mean) of the logarithmized [multinomial coefficient](/P/mult-pmf) with superset size $n$ |
| 43 | + |
| 44 | +$$ \label{eq:E-lmf} |
| 45 | +\mathrm{E}_\mathrm{lmf}(n,p) = \mathrm{E}\left[ \log {n \choose {X_1, \ldots, X_k}} \right] \quad \text{where} \quad X \sim \mathrm{Mult}(n,p) \; . |
| 46 | +$$ |
| 47 | + |
| 48 | + |
| 49 | +**Proof:** The [entropy](/D/ent) is defined as the probability-weighted average of the logarithmized probabilities for all possible values: |
| 50 | + |
| 51 | +$$ \label{eq:ent} |
| 52 | +\mathrm{H}(X) = - \sum_{x \in \mathcal{X}} p(x) \cdot \log_b p(x) \; . |
| 53 | +$$ |
| 54 | + |
| 55 | +The [probability mass function of the multinomial distribution](/P/mult-pmf) is |
| 56 | + |
| 57 | +$$ \label{eq:mult-pmf} |
| 58 | +f_X(x) = {n \choose {x_1, \ldots, x_k}} \, \prod_{i=1}^k {p_i}^{x_i} |
| 59 | +$$ |
| 60 | + |
| 61 | +Let $\mathcal{X}_{n,k}$ be the set of all vectors $x \in \mathbb{N}^{1 \times k}$ satisfying $\sum_{i=1}^{k} x_i = n$. Then, we have: |
| 62 | + |
| 63 | +$$ \label{eq:mult-ent-s1} |
| 64 | +\begin{split} |
| 65 | +\mathrm{H}(X) &= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \log f_X(x) \\ |
| 66 | +&= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \log \left[ {n \choose {x_1, \ldots, x_k}} \, \prod_{i=1}^k {p_i}^{x_i} \right] \\ |
| 67 | +&= - \sum_{x \in \mathcal{X}_{n,k}} f_X(x) \cdot \left[ \log {n \choose {x_1, \ldots, x_k}} + \sum_{i=1}^{k} x_i \cdot \log p_i \right] \; . |
| 68 | +\end{split} |
| 69 | +$$ |
| 70 | + |
| 71 | +Since the first factor in the sum corresponds to the [probability mass](/D/pmf) of $X=x$, we can rewrite this as the sum of the [expected values](/D/mean) [of the functions](/P/mean-lotus) of the [discrete random variable](/D/rvar-disc) $x$ in the square bracket: |
| 72 | + |
| 73 | +$$ \label{eq:mult-ent-s2} |
| 74 | +\begin{split} |
| 75 | +\mathrm{H}(X) &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \left\langle \sum_{i=1}^{k} x_i \cdot \log p_i \right\rangle_{p(x)} \\ |
| 76 | +&= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \sum_{i=1}^{k} \left\langle x_i \cdot \log p_i \right\rangle_{p(x)} \; . |
| 77 | +\end{split} |
| 78 | +$$ |
| 79 | + |
| 80 | +Using the [expected value of the multinomial distribution](/P/mult-mean), i.e. $X \sim \mathrm{Mult}(n,p) \Rightarrow \left\langle x_i \right\rangle = n p_i$, this gives: |
| 81 | + |
| 82 | +$$ \label{eq:mult-ent-s3} |
| 83 | +\begin{split} |
| 84 | +\mathrm{H}(X) &= - \left\langle \log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - \sum_{i=1}^{k} n p_i \cdot \log p_i \\ |
| 85 | +&= - \left\langle\log {n \choose {x_1, \ldots, x_k}} \right\rangle_{p(x)} - n \sum_{i=1}^{k} p_i \cdot \log p_i \; . |
| 86 | +\end{split} |
| 87 | +$$ |
| 88 | + |
| 89 | +Finally, we note that the first term is the negative [expected value](/D/mean) of the logarithm of a [multinomial coefficient](/P/mult-pmf) and that the second term is the [entropy of the categorical distribution](/P/cat-ent), such that we finally get: |
| 90 | + |
| 91 | +$$ \label{eq:mult-ent-s4} |
| 92 | +\mathrm{H}(X) = n \cdot \mathrm{H}_\mathrm{cat}(p) - \mathrm{E}_\mathrm{lmc}(n,p) \; . |
| 93 | +$$ |
0 commit comments