added 4 proofs

JoramSoch · web-flow · commit 351838e8118e · 2022-12-04T20:39:20.000+01:00
diff --git a/P/mult-lbf.md b/P/mult-lbf.md
@@ -0,0 +1,96 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-12-02 17:47:00
+
+title: "Log Bayes factor for multinomial observations"
+chapter: "Statistical Models"
+section: "Count data"
+topic: "Multinomial observations"
+theorem: "Log Bayes factor"
+
+sources:
+
+proof_id: "P387"
+shortcut: "mult-lbf"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult):
+
+$$ \label{eq:Mult}
+y \sim \mathrm{Mult}(n,p) \; .
+$$
+
+Moreover, assume two [statistical models](/D/fpm), one assuming that each $p_j$ is $1/k$ ([null model](/D/h0)), the other imposing a [Dirichlet distribution](/P/mult-prior) as the [prior distribution](/D/prior) on the model parameters $p_1, \ldots, p_k$ ([alternative](/D/h1)):
+
+$$ \label{eq:Mult-m01}
+\begin{split}
+m_0&: \; y \sim \mathrm{Mult}(n,p), \; p = [1/k, \ldots, 1/k] \\
+m_1&: \; y \sim \mathrm{Mult}(n,p), \; p \sim \mathrm{Dir}(\alpha_0) \; .
+\end{split}
+$$
+
+Then, the [log Bayes factor](/D/lbf) in favor of $m_1$ against $m_0$ is
+
+$$ \label{eq:Mult-LBF}
+\begin{split}
+\mathrm{LBF}_{10} &= \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\
+&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) - n \log \left( \frac{1}{k} \right)
+\end{split}
+$$
+
+where $\Gamma(x)$ is the gamma function and $\alpha_n$ are the [posterior hyperparameters for multinomial observations](/P/mult-post) which are functions of the [numbers of observations](/D/mult) $y_1, \ldots, y_k$.
+
+
+**Proof:** [The log Bayes factor is equal to the difference of two log model evidences](/P/lbf-lme):
+
+$$ \label{eq:LBF-LME}
+\mathrm{LBF}_{12} = \mathrm{LME}(m_1) - \mathrm{LME}(m_2) \; .
+$$
+
+The LME of the alternative $m_1$ is equal to the [log model evidence for multinomial observations](/P/mult-lme):
+
+$$ \label{eq:Mult-LME-m1}
+\begin{split}
+\mathrm{LME}(m_1) = \log p(y|m_1) &= \log \Gamma(n+1) - \sum_{j=1}^{k} \log \Gamma(k_j+1) \\
+&+ \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\
+&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) \; .
+\end{split}
+$$
+
+Because the null model $m_0$ has no free parameter, its [log model evidence](/D/lme) (logarithmized [marginal likelihood](/D/ml)) is equal to the [log-likelihood function for multinomial observations](/P/mult-mle) at the value $p = [1/k, \ldots, 1/k]$:
+
+$$ \label{eq:Mult-LME-m0}
+\begin{split}
+\mathrm{LME}(m_0) = \log p(y|p = p_0) &= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log \left( \frac{1}{k} \right) \\
+&= \log {n \choose {y_1, \ldots, y_k}} + n \log \left( \frac{1}{k} \right) \; .
+\end{split}
+$$
+
+Subtracting the two LMEs from each other, the LBF emerges as
+
+$$ \label{eq:Mult-LBF-m10}
+\begin{split}
+\mathrm{LBF}_{10} &= \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\
+&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) - n \log \left( \frac{1}{k} \right)
+\end{split}
+$$
+
+where the [posterior hyperparameters](/D/post) [are given by](/P/mult-post)
+
+$$ \label{eq:Mult-post-par}
+\begin{split}
+\alpha_n &= \alpha_0 + y \\
+&= [\alpha_{01}, \ldots, \alpha_{0k}] + [y_1, \ldots, y_k] \\
+&= [\alpha_{01} + y_1, \ldots, \alpha_{0k} + y_k] \\
+\text{i.e.} \quad \alpha_{nj} &= \alpha_{0j} + y_j \quad \text{for all} \quad j = 1, \ldots, k
+\end{split}
+$$
+
+with the [numbers of observations](/D/mult) $y_1, \ldots, y_k$.
diff --git a/P/mult-mle.md b/P/mult-mle.md
@@ -0,0 +1,59 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-12-02 17:00:00
+
+title: "Maximum likelihood estimation for multinomial observations"
+chapter: "Statistical Models"
+section: "Count data"
+topic: "Multinomial observations"
+theorem: "Maximum likelihood estimation"
+
+sources:
+
+proof_id: "P385"
+shortcut: "mult-mle"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult):
+
+$$ \label{eq:Mult}
+y \sim \mathrm{Mult}(n,p) \; .
+$$
+
+Then, the [maximum likelihood estimator](/P/mle) of $p$ is
+
+$$ \label{eq:Mult-MLE}
+\hat{p} = \frac{1}{n} y , \quad \text{i.e.} \quad \hat{p}_j = \frac{y_j}{n} \quad \text{for all} \quad j = 1, \ldots, k \; .
+$$
+
+
+**Proof:** Note that [the marginal distribution of each element in a multinomial random vector is a binomial distribution](/P/mult-marg)
+
+$$ \label{eq:Mult-marg}
+X \sim \mathrm{Mult}(n,p) \quad \Rightarrow \quad X_j \sim \mathrm{Bin}(n, p_j) \quad \text{for all} \quad j = 1, \ldots, k \; .
+$$
+
+Thus, combining \eqref{eq:Mult} with \eqref{eq:Mult}, we have
+
+$$ \label{eq:Mult-Bin}
+y_j \sim \mathrm{Bin}(n,p_j)
+$$
+
+which [implies the likelihood function](/P/bin-mle)
+
+$$ \label{eq:Bin-LF}
+\mathrm{p}(y|p_j) = \mathrm{Bin}(y_j; n, p_j) = {n \choose y_j} \, p_j^{y_j} \, (1-p_j)^{n-y_j} \; .
+$$
+
+To this, we can apply [maximum likelihood estimation for binomial observations](/P/bin-mle), such that the MLE for each $p_j$ is
+
+$$ \label{eq:Mult-MLE-qed}
+\hat{p}_j = \frac{y_j}{n} \; .
+$$
diff --git a/P/mult-mll.md b/P/mult-mll.md
@@ -0,0 +1,89 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-12-02 17:22:00
+
+title: "Maximum log-likelihood for multinomial observations"
+chapter: "Statistical Models"
+section: "Count data"
+topic: "Multinomial observations"
+theorem: "Maximum log-likelihood"
+
+sources:
+
+proof_id: "P386"
+shortcut: "mult-mll"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult):
+
+$$ \label{eq:Mult}
+y \sim \mathrm{Mult}(n,p) \; .
+$$
+
+Then, the [maximum log-likelihood](/D/mll) for this model is
+
+$$ \label{eq:Mult-MLL}
+\mathrm{MLL} = \log \Gamma(n+1) - \sum_{j=1}^{k} \log \Gamma(y_j+1) - n \log (n) + \sum_{j=1}^{k} y_j \log (y_j) \; .
+$$
+
+
+**Proof:** With the [probability mass function of the multinomial distribution](/P/mult-pmf), equation \eqref{eq:Mult} implies the following [likelihood function](/D/lf):
+
+$$ \label{eq:Mult-LF}
+\begin{split}
+\mathrm{p}(y|p) &= \mathrm{Mult}(y; n, p) \\
+&= {n \choose {y_1, \ldots, y_k}} \prod_{j=1}^{k} {p_j}^{y_j} \; .
+\end{split}
+$$
+
+Thus, the [log-likelihood function](/D/llf) is given by
+
+$$ \label{eq:Mult-LL}
+\begin{split}
+\mathrm{LL}(p) &= \log \mathrm{p}(y|p) \\
+&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log (p_j) \; .
+\end{split}
+$$
+
+The [maximum likelihood estimates of the category probabilities](/P/mult-mle) $p$ are
+
+$$ \label{eq:Mult-MLE}
+\hat{p} = \left[ \hat{p}_1, \ldots, \hat{p}_k \right] \quad \text{with} \quad \hat{p}_j = \frac{y_j}{n} \quad \text{for all} \quad j = 1, \ldots, k \; .
+$$
+
+Plugging \eqref{eq:Mult-MLE} into \eqref{eq:Mult-LL}, we obtain the [maximum log-likelihood](/D/mll) of the multinomial observation model in \eqref{eq:Mult} as
+
+$$ \label{eq:Mult-MLL-s1}
+\begin{split}
+\mathrm{MLL} &= \mathrm{LL}(\hat{p}) \\
+&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log \left( \frac{y_j}{n} \right) \\
+&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} \left[ y_j \log (y_j) - y_j \log (n) \right] \\
+&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k}  y_j \log (y_j) - \sum_{j=1}^{k} y_j \log (n) \\
+&= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k}  y_j \log (y_j) - n \log (n) \; .
+\end{split}
+$$
+
+With the definition of the multinomial coefficient
+
+$$ \label{eq:mult-coeff}
+{n \choose {k_1, \ldots, k_m}} = \frac{n!}{k_1! \cdot \ldots \cdot k_m!}
+$$
+
+and the definition of the gamma function
+
+$$ \label{eq:gam-fct}
+\Gamma(n) = (n-1)! \; ,
+$$
+
+the MLL finally becomes
+
+$$ \label{eq:Mult-MLL-s2}
+\mathrm{MLL} = \log \Gamma(n+1) - \sum_{j=1}^{k} \log \Gamma(y_j+1) - n \log (n) + \sum_{j=1}^{k} y_j \log (y_j) \; .
+$$
diff --git a/P/mult-pp.md b/P/mult-pp.md
@@ -0,0 +1,86 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2022-12-02 18:03:00
+
+title: "Posterior probability of the alternative model for multinomial observations"
+chapter: "Statistical Models"
+section: "Count data"
+topic: "Multinomial observations"
+theorem: "Posterior probability"
+
+sources:
+
+proof_id: "P388"
+shortcut: "mult-pp"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult):
+
+$$ \label{eq:Mult}
+y \sim \mathrm{Mult}(n,p) \; .
+$$
+
+Moreover, assume two [statistical models](/D/fpm), one assuming that each $p_j$ is $1/k$ ([null model](/D/h0)), the other imposing a [Dirichlet distribution](/P/mult-prior) as the [prior distribution](/D/prior) on the model parameters $p_1, \ldots, p_k$ ([alternative](/D/h1)):
+
+$$ \label{eq:Mult-m01}
+\begin{split}
+m_0&: \; y \sim \mathrm{Mult}(n,p), \; p = [1/k, \ldots, 1/k] \\
+m_1&: \; y \sim \mathrm{Mult}(n,p), \; p \sim \mathrm{Dir}(\alpha_0) \; .
+\end{split}
+$$
+
+Then, the [posterior probability](/D/pmp) of the [alternative model](/D/h1) is given by
+
+$$ \label{eq:Mult-PP1}
+p(m_1|y) = 
+$$
+
+where $\Gamma(x)$ is the gamma function and $\alpha_n$ are the [posterior hyperparameters for multinomial observations](/P/mult-post) which are functions of the [numbers of observations](/D/mult) $y_1, \ldots, y_k$.
+
+
+**Proof:** [The posterior probability for one of two models is a function of the log Bayes factor in favor of this model](/P/pmp-lbf):
+
+$$ \label{eq:PP-LBF}
+p(m_1|y) = \frac{\exp(\mathrm{LBF}_{12})}{\exp(\mathrm{LBF}_{12}) + 1} \; .
+$$
+
+The [log Bayes factor in favor of the alternative model for multinomial observations](/P/mult-lbf) is given by
+
+$$ \label{eq:Mult-LBF10}
+\begin{split}
+\mathrm{LBF}_{10} &= \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\
+&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) - n \log \left( \frac{1}{k} \right)
+\end{split}
+$$
+
+and the corresponding [Bayes factor](/D/bf), i.e. [exponentiated log Bayes factor](/P/lbf-der), is equal to
+
+$$ \label{eq:Mult-BF10}
+\mathrm{BF}_{10} = \exp(\mathrm{LBF}_{10}) = k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})} \; .
+$$
+
+Thus, the posterior probability of the alternative, assuming a prior distribution over the probabilities $p_1, \ldots, p_k$, compared to the null model, assuming fixed probabilities $p = [1/k, \ldots, 1/k]$, follows as
+
+$$ \label{eq:Mult-PP1-qed}
+\begin{split}
+p(m_1|y) &\overset{\eqref{eq:PP-LBF}}{=} \frac{\exp(\mathrm{LBF}_{10})}{\exp(\mathrm{LBF}_{10}) + 1} \\
+&\overset{\eqref{eq:Mult-BF10}}{=} \frac{k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})}}{k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})} + 1} \\
+&= \frac{k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})}}{k^n \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{nj})}{\prod_{j=1}^k \Gamma(\alpha_{0j})} \left( 1 + k^{-n} \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{0j})}{\prod_{j=1}^k \Gamma(\alpha_{nj})} \right)} \\
+&= \frac{1}{1 + k^{-n} \cdot \frac{\Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right)}{\Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right)} \cdot \frac{\prod_{j=1}^k \Gamma(\alpha_{0j})}{\prod_{j=1}^k \Gamma(\alpha_{nj})}}
+\end{split}
+$$
+
+where the [posterior hyperparameters](/D/post) [are given by](/P/mult-post)
+
+$$ \label{eq:Mult-post-par}
+\alpha_n = \alpha_0 + y, \quad \text{i.e.} \quad \alpha_{nj} = \alpha_{0j} + y_j
+$$
+
+with the [numbers of observations](/D/mult) $y_1, \ldots, y_k$.