Skip to content

Commit 351838e

Browse files
authored
added 4 proofs
1 parent 4b5701c commit 351838e

4 files changed

Lines changed: 330 additions & 0 deletions

File tree

P/mult-lbf.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-12-02 17:47:00
9+
10+
title: "Log Bayes factor for multinomial observations"
11+
chapter: "Statistical Models"
12+
section: "Count data"
13+
topic: "Multinomial observations"
14+
theorem: "Log Bayes factor"
15+
16+
sources:
17+
18+
proof_id: "P387"
19+
shortcut: "mult-lbf"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult):
25+
26+
$$ \label{eq:Mult}
27+
y \sim \mathrm{Mult}(n,p) \; .
28+
$$
29+
30+
Moreover, assume two [statistical models](/D/fpm), one assuming that each $p_j$ is $1/k$ ([null model](/D/h0)), the other imposing a [Dirichlet distribution](/P/mult-prior) as the [prior distribution](/D/prior) on the model parameters $p_1, \ldots, p_k$ ([alternative](/D/h1)):
31+
32+
$$ \label{eq:Mult-m01}
33+
\begin{split}
34+
m_0&: \; y \sim \mathrm{Mult}(n,p), \; p = [1/k, \ldots, 1/k] \\
35+
m_1&: \; y \sim \mathrm{Mult}(n,p), \; p \sim \mathrm{Dir}(\alpha_0) \; .
36+
\end{split}
37+
$$
38+
39+
Then, the [log Bayes factor](/D/lbf) in favor of $m_1$ against $m_0$ is
40+
41+
$$ \label{eq:Mult-LBF}
42+
\begin{split}
43+
\mathrm{LBF}_{10} &= \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\
44+
&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) - n \log \left( \frac{1}{k} \right)
45+
\end{split}
46+
$$
47+
48+
where $\Gamma(x)$ is the gamma function and $\alpha_n$ are the [posterior hyperparameters for multinomial observations](/P/mult-post) which are functions of the [numbers of observations](/D/mult) $y_1, \ldots, y_k$.
49+
50+
51+
**Proof:** [The log Bayes factor is equal to the difference of two log model evidences](/P/lbf-lme):
52+
53+
$$ \label{eq:LBF-LME}
54+
\mathrm{LBF}_{12} = \mathrm{LME}(m_1) - \mathrm{LME}(m_2) \; .
55+
$$
56+
57+
The LME of the alternative $m_1$ is equal to the [log model evidence for multinomial observations](/P/mult-lme):
58+
59+
$$ \label{eq:Mult-LME-m1}
60+
\begin{split}
61+
\mathrm{LME}(m_1) = \log p(y|m_1) &= \log \Gamma(n+1) - \sum_{j=1}^{k} \log \Gamma(k_j+1) \\
62+
&+ \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\
63+
&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) \; .
64+
\end{split}
65+
$$
66+
67+
Because the null model $m_0$ has no free parameter, its [log model evidence](/D/lme) (logarithmized [marginal likelihood](/D/ml)) is equal to the [log-likelihood function for multinomial observations](/P/mult-mle) at the value $p = [1/k, \ldots, 1/k]$:
68+
69+
$$ \label{eq:Mult-LME-m0}
70+
\begin{split}
71+
\mathrm{LME}(m_0) = \log p(y|p = p_0) &= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log \left( \frac{1}{k} \right) \\
72+
&= \log {n \choose {y_1, \ldots, y_k}} + n \log \left( \frac{1}{k} \right) \; .
73+
\end{split}
74+
$$
75+
76+
Subtracting the two LMEs from each other, the LBF emerges as
77+
78+
$$ \label{eq:Mult-LBF-m10}
79+
\begin{split}
80+
\mathrm{LBF}_{10} &= \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\
81+
&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) - n \log \left( \frac{1}{k} \right)
82+
\end{split}
83+
$$
84+
85+
where the [posterior hyperparameters](/D/post) [are given by](/P/mult-post)
86+
87+
$$ \label{eq:Mult-post-par}
88+
\begin{split}
89+
\alpha_n &= \alpha_0 + y \\
90+
&= [\alpha_{01}, \ldots, \alpha_{0k}] + [y_1, \ldots, y_k] \\
91+
&= [\alpha_{01} + y_1, \ldots, \alpha_{0k} + y_k] \\
92+
\text{i.e.} \quad \alpha_{nj} &= \alpha_{0j} + y_j \quad \text{for all} \quad j = 1, \ldots, k
93+
\end{split}
94+
$$
95+
96+
with the [numbers of observations](/D/mult) $y_1, \ldots, y_k$.

P/mult-mle.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-12-02 17:00:00
9+
10+
title: "Maximum likelihood estimation for multinomial observations"
11+
chapter: "Statistical Models"
12+
section: "Count data"
13+
topic: "Multinomial observations"
14+
theorem: "Maximum likelihood estimation"
15+
16+
sources:
17+
18+
proof_id: "P385"
19+
shortcut: "mult-mle"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult):
25+
26+
$$ \label{eq:Mult}
27+
y \sim \mathrm{Mult}(n,p) \; .
28+
$$
29+
30+
Then, the [maximum likelihood estimator](/P/mle) of $p$ is
31+
32+
$$ \label{eq:Mult-MLE}
33+
\hat{p} = \frac{1}{n} y , \quad \text{i.e.} \quad \hat{p}_j = \frac{y_j}{n} \quad \text{for all} \quad j = 1, \ldots, k \; .
34+
$$
35+
36+
37+
**Proof:** Note that [the marginal distribution of each element in a multinomial random vector is a binomial distribution](/P/mult-marg)
38+
39+
$$ \label{eq:Mult-marg}
40+
X \sim \mathrm{Mult}(n,p) \quad \Rightarrow \quad X_j \sim \mathrm{Bin}(n, p_j) \quad \text{for all} \quad j = 1, \ldots, k \; .
41+
$$
42+
43+
Thus, combining \eqref{eq:Mult} with \eqref{eq:Mult}, we have
44+
45+
$$ \label{eq:Mult-Bin}
46+
y_j \sim \mathrm{Bin}(n,p_j)
47+
$$
48+
49+
which [implies the likelihood function](/P/bin-mle)
50+
51+
$$ \label{eq:Bin-LF}
52+
\mathrm{p}(y|p_j) = \mathrm{Bin}(y_j; n, p_j) = {n \choose y_j} \, p_j^{y_j} \, (1-p_j)^{n-y_j} \; .
53+
$$
54+
55+
To this, we can apply [maximum likelihood estimation for binomial observations](/P/bin-mle), such that the MLE for each $p_j$ is
56+
57+
$$ \label{eq:Mult-MLE-qed}
58+
\hat{p}_j = \frac{y_j}{n} \; .
59+
$$

P/mult-mll.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-12-02 17:22:00
9+
10+
title: "Maximum log-likelihood for multinomial observations"
11+
chapter: "Statistical Models"
12+
section: "Count data"
13+
topic: "Multinomial observations"
14+
theorem: "Maximum log-likelihood"
15+
16+
sources:
17+
18+
proof_id: "P386"
19+
shortcut: "mult-mll"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult):
25+
26+
$$ \label{eq:Mult}
27+
y \sim \mathrm{Mult}(n,p) \; .
28+
$$
29+
30+
Then, the [maximum log-likelihood](/D/mll) for this model is
31+
32+
$$ \label{eq:Mult-MLL}
33+
\mathrm{MLL} = \log \Gamma(n+1) - \sum_{j=1}^{k} \log \Gamma(y_j+1) - n \log (n) + \sum_{j=1}^{k} y_j \log (y_j) \; .
34+
$$
35+
36+
37+
**Proof:** With the [probability mass function of the multinomial distribution](/P/mult-pmf), equation \eqref{eq:Mult} implies the following [likelihood function](/D/lf):
38+
39+
$$ \label{eq:Mult-LF}
40+
\begin{split}
41+
\mathrm{p}(y|p) &= \mathrm{Mult}(y; n, p) \\
42+
&= {n \choose {y_1, \ldots, y_k}} \prod_{j=1}^{k} {p_j}^{y_j} \; .
43+
\end{split}
44+
$$
45+
46+
Thus, the [log-likelihood function](/D/llf) is given by
47+
48+
$$ \label{eq:Mult-LL}
49+
\begin{split}
50+
\mathrm{LL}(p) &= \log \mathrm{p}(y|p) \\
51+
&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log (p_j) \; .
52+
\end{split}
53+
$$
54+
55+
The [maximum likelihood estimates of the category probabilities](/P/mult-mle) $p$ are
56+
57+
$$ \label{eq:Mult-MLE}
58+
\hat{p} = \left[ \hat{p}_1, \ldots, \hat{p}_k \right] \quad \text{with} \quad \hat{p}_j = \frac{y_j}{n} \quad \text{for all} \quad j = 1, \ldots, k \; .
59+
$$
60+
61+
Plugging \eqref{eq:Mult-MLE} into \eqref{eq:Mult-LL}, we obtain the [maximum log-likelihood](/D/mll) of the multinomial observation model in \eqref{eq:Mult} as
62+
63+
$$ \label{eq:Mult-MLL-s1}
64+
\begin{split}
65+
\mathrm{MLL} &= \mathrm{LL}(\hat{p}) \\
66+
&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log \left( \frac{y_j}{n} \right) \\
67+
&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} \left[ y_j \log (y_j) - y_j \log (n) \right] \\
68+
&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log (y_j) - \sum_{j=1}^{k} y_j \log (n) \\
69+
&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log (y_j) - n \log (n) \; .
70+
\end{split}
71+
$$
72+
73+
With the definition of the multinomial coefficient
74+
75+
$$ \label{eq:mult-coeff}
76+
{n \choose {k_1, \ldots, k_m}} = \frac{n!}{k_1! \cdot \ldots \cdot k_m!}
77+
$$
78+
79+
and the definition of the gamma function
80+
81+
$$ \label{eq:gam-fct}
82+
\Gamma(n) = (n-1)! \; ,
83+
$$
84+
85+
the MLL finally becomes
86+
87+
$$ \label{eq:Mult-MLL-s2}
88+
\mathrm{MLL} = \log \Gamma(n+1) - \sum_{j=1}^{k} \log \Gamma(y_j+1) - n \log (n) + \sum_{j=1}^{k} y_j \log (y_j) \; .
89+
$$

P/mult-pp.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-12-02 18:03:00
9+
10+
title: "Posterior probability of the alternative model for multinomial observations"
11+
chapter: "Statistical Models"
12+
section: "Count data"
13+
topic: "Multinomial observations"
14+
theorem: "Posterior probability"
15+
16+
sources:
17+
18+
proof_id: "P388"
19+
shortcut: "mult-pp"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult):
25+
26+
$$ \label{eq:Mult}
27+
y \sim \mathrm{Mult}(n,p) \; .
28+
$$
29+
30+
Moreover, assume two [statistical models](/D/fpm), one assuming that each $p_j$ is $1/k$ ([null model](/D/h0)), the other imposing a [Dirichlet distribution](/P/mult-prior) as the [prior distribution](/D/prior) on the model parameters $p_1, \ldots, p_k$ ([alternative](/D/h1)):
31+
32+
$$ \label{eq:Mult-m01}
33+
\begin{split}
34+
m_0&: \; y \sim \mathrm{Mult}(n,p), \; p = [1/k, \ldots, 1/k] \\
35+
m_1&: \; y \sim \mathrm{Mult}(n,p), \; p \sim \mathrm{Dir}(\alpha_0) \; .
36+
\end{split}
37+
$$
38+
39+
Then, the [posterior probability](/D/pmp) of the [alternative model](/D/h1) is given by
40+
41+
$$ \label{eq:Mult-PP1}
42+
p(m_1|y) =
43+
$$
44+
45+
where $\Gamma(x)$ is the gamma function and $\alpha_n$ are the [posterior hyperparameters for multinomial observations](/P/mult-post) which are functions of the [numbers of observations](/D/mult) $y_1, \ldots, y_k$.
46+
47+
48+
**Proof:** [The posterior probability for one of two models is a function of the log Bayes factor in favor of this model](/P/pmp-lbf):
49+
50+
$$ \label{eq:PP-LBF}
51+
p(m_1|y) = \frac{\exp(\mathrm{LBF}_{12})}{\exp(\mathrm{LBF}_{12}) + 1} \; .
52+
$$
53+
54+
The [log Bayes factor in favor of the alternative model for multinomial observations](/P/mult-lbf) is given by
55+
56+
$$ \label{eq:Mult-LBF10}
57+
\begin{split}
58+
\mathrm{LBF}_{10} &= \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\
59+
&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) - n \log \left( \frac{1}{k} \right)
60+
\end{split}
61+
$$
62+
63+
and the corresponding [Bayes factor](/D/bf), i.e. [exponentiated log Bayes factor](/P/lbf-der), is equal to
64+
65+
$$ \label{eq:Mult-BF10}
66+
\mathrm{BF}_{10} = \exp(\mathrm{LBF}_{10}) = k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})} \; .
67+
$$
68+
69+
Thus, the posterior probability of the alternative, assuming a prior distribution over the probabilities $p_1, \ldots, p_k$, compared to the null model, assuming fixed probabilities $p = [1/k, \ldots, 1/k]$, follows as
70+
71+
$$ \label{eq:Mult-PP1-qed}
72+
\begin{split}
73+
p(m_1|y) &\overset{\eqref{eq:PP-LBF}}{=} \frac{\exp(\mathrm{LBF}_{10})}{\exp(\mathrm{LBF}_{10}) + 1} \\
74+
&\overset{\eqref{eq:Mult-BF10}}{=} \frac{k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})}}{k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})} + 1} \\
75+
&= \frac{k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})}}{k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})} \left( 1 + k^{-n} \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{0j})}{\prod_{j=1}^k \Gamma(\alpha_{nj})} \right)} \\
76+
&= \frac{1}{1 + k^{-n} \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{0j})}{\prod_{j=1}^k \Gamma(\alpha_{nj})}}
77+
\end{split}
78+
$$
79+
80+
where the [posterior hyperparameters](/D/post) [are given by](/P/mult-post)
81+
82+
$$ \label{eq:Mult-post-par}
83+
\alpha_n = \alpha_0 + y, \quad \text{i.e.} \quad \alpha_{nj} = \alpha_{0j} + y_j
84+
$$
85+
86+
with the [numbers of observations](/D/mult) $y_1, \ldots, y_k$.

0 commit comments

Comments
 (0)