|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2022-12-02 17:47:00 |
| 9 | + |
| 10 | +title: "Log Bayes factor for multinomial observations" |
| 11 | +chapter: "Statistical Models" |
| 12 | +section: "Count data" |
| 13 | +topic: "Multinomial observations" |
| 14 | +theorem: "Log Bayes factor" |
| 15 | + |
| 16 | +sources: |
| 17 | + |
| 18 | +proof_id: "P387" |
| 19 | +shortcut: "mult-lbf" |
| 20 | +username: "JoramSoch" |
| 21 | +--- |
| 22 | + |
| 23 | + |
| 24 | +**Theorem:** Let $y = [y_1, \ldots, y_k]$ be the number of observations in $k$ categories resulting from $n$ independent trials with unknown category probabilities $p = [p_1, \ldots, p_k]$, such that $y$ follows a [multinomial distribution](/D/mult): |
| 25 | + |
| 26 | +$$ \label{eq:Mult} |
| 27 | +y \sim \mathrm{Mult}(n,p) \; . |
| 28 | +$$ |
| 29 | + |
| 30 | +Moreover, assume two [statistical models](/D/fpm), one assuming that each $p_j$ is $1/k$ ([null model](/D/h0)), the other imposing a [Dirichlet distribution](/P/mult-prior) as the [prior distribution](/D/prior) on the model parameters $p_1, \ldots, p_k$ ([alternative](/D/h1)): |
| 31 | + |
| 32 | +$$ \label{eq:Mult-m01} |
| 33 | +\begin{split} |
| 34 | +m_0&: \; y \sim \mathrm{Mult}(n,p), \; p = [1/k, \ldots, 1/k] \\ |
| 35 | +m_1&: \; y \sim \mathrm{Mult}(n,p), \; p \sim \mathrm{Dir}(\alpha_0) \; . |
| 36 | +\end{split} |
| 37 | +$$ |
| 38 | + |
| 39 | +Then, the [log Bayes factor](/D/lbf) in favor of $m_1$ against $m_0$ is |
| 40 | + |
| 41 | +$$ \label{eq:Mult-LBF} |
| 42 | +\begin{split} |
| 43 | +\mathrm{LBF}_{10} &= \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\ |
| 44 | +&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) - n \log \left( \frac{1}{k} \right) |
| 45 | +\end{split} |
| 46 | +$$ |
| 47 | + |
| 48 | +where $\Gamma(x)$ is the gamma function and $\alpha_n$ are the [posterior hyperparameters for multinomial observations](/P/mult-post) which are functions of the [numbers of observations](/D/mult) $y_1, \ldots, y_k$. |
| 49 | + |
| 50 | + |
| 51 | +**Proof:** [The log Bayes factor is equal to the difference of two log model evidences](/P/lbf-lme): |
| 52 | + |
| 53 | +$$ \label{eq:LBF-LME} |
| 54 | +\mathrm{LBF}_{12} = \mathrm{LME}(m_1) - \mathrm{LME}(m_2) \; . |
| 55 | +$$ |
| 56 | + |
| 57 | +The LME of the alternative $m_1$ is equal to the [log model evidence for multinomial observations](/P/mult-lme): |
| 58 | + |
| 59 | +$$ \label{eq:Mult-LME-m1} |
| 60 | +\begin{split} |
| 61 | +\mathrm{LME}(m_1) = \log p(y|m_1) &= \log \Gamma(n+1) - \sum_{j=1}^{k} \log \Gamma(k_j+1) \\ |
| 62 | +&+ \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\ |
| 63 | +&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) \; . |
| 64 | +\end{split} |
| 65 | +$$ |
| 66 | + |
| 67 | +Because the null model $m_0$ has no free parameter, its [log model evidence](/D/lme) (logarithmized [marginal likelihood](/D/ml)) is equal to the [log-likelihood function for multinomial observations](/P/mult-mle) at the value $p = [1/k, \ldots, 1/k]$: |
| 68 | + |
| 69 | +$$ \label{eq:Mult-LME-m0} |
| 70 | +\begin{split} |
| 71 | +\mathrm{LME}(m_0) = \log p(y|p = p_0) &= \log {n \choose {y_1, \ldots, y_k}} + \sum_{j=1}^{k} y_j \log \left( \frac{1}{k} \right) \\ |
| 72 | +&= \log {n \choose {y_1, \ldots, y_k}} + n \log \left( \frac{1}{k} \right) \; . |
| 73 | +\end{split} |
| 74 | +$$ |
| 75 | + |
| 76 | +Subtracting the two LMEs from each other, the LBF emerges as |
| 77 | + |
| 78 | +$$ \label{eq:Mult-LBF-m10} |
| 79 | +\begin{split} |
| 80 | +\mathrm{LBF}_{10} &= \log \Gamma \left( \sum_{j=1}^{k} \alpha_{0j} \right) - \log \Gamma \left( \sum_{j=1}^{k} \alpha_{nj} \right) \\ |
| 81 | +&+ \sum_{j=1}^k \log \Gamma(\alpha_{nj}) - \sum_{j=1}^k \log \Gamma(\alpha_{0j}) - n \log \left( \frac{1}{k} \right) |
| 82 | +\end{split} |
| 83 | +$$ |
| 84 | + |
| 85 | +where the [posterior hyperparameters](/D/post) [are given by](/P/mult-post) |
| 86 | + |
| 87 | +$$ \label{eq:Mult-post-par} |
| 88 | +\begin{split} |
| 89 | +\alpha_n &= \alpha_0 + y \\ |
| 90 | +&= [\alpha_{01}, \ldots, \alpha_{0k}] + [y_1, \ldots, y_k] \\ |
| 91 | +&= [\alpha_{01} + y_1, \ldots, \alpha_{0k} + y_k] \\ |
| 92 | +\text{i.e.} \quad \alpha_{nj} &= \alpha_{0j} + y_j \quad \text{for all} \quad j = 1, \ldots, k |
| 93 | +\end{split} |
| 94 | +$$ |
| 95 | + |
| 96 | +with the [numbers of observations](/D/mult) $y_1, \ldots, y_k$. |
0 commit comments