Skip to content

Commit fb45bf6

Browse files
authored
Merge pull request #86 from tomfaulkenberry/master
added [em,bf-ep] / modified [bf,bf-sddr]
2 parents ef93153 + d8c2db0 commit fb45bf6

4 files changed

Lines changed: 151 additions & 23 deletions

File tree

D/bf.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,22 +28,22 @@ username: "tomfaulkenberry"
2828
---
2929

3030

31-
**Definition:** Consider two competing [generative models](/D/gm) $\mathcal{M}_1$ and $\mathcal{M}_2$ for observed data $y$. Then the Bayes factor in favor $\mathcal{M}_1$ over $\mathcal{M}_2$ is the ratio of [marginal likelihoods](/D/ml) of $\mathcal{M}_1$ and $\mathcal{M}_2$:
31+
**Definition:** Consider two competing [generative models](/D/gm) $m_1$ and $m_2$ for observed data $y$. Then the Bayes factor in favor $m_1$ over $m_2$ is the ratio of [marginal likelihoods](/D/ml) of $m_1$ and $m_2$:
3232

3333
$$ \label{eq:BF}
34-
\text{BF}_{12} = \frac{p(y\mid \mathcal{M}_1)}{p(y\mid \mathcal{M}_2)}.
34+
\text{BF}_{12} = \frac{p(y\mid m_1)}{p(y\mid m_2)}.
3535
$$
3636

3737
Note: by [Bayes Theorem](/P/bayes-th), the ratio of [posterior model probabilities](/D/pmp) (i.e., the posterior model odds) can be written as
3838

3939
$$ \label{eq:odds}
40-
\frac{p(\mathcal{M}_1\mid y)}{p(\mathcal{M}_2\mid y)} = \frac{p(\mathcal{M}_1)}{p(\mathcal{M}_2)} \cdot \frac{p(y\mid \mathcal{M}_1)}{p(y\mid \mathcal{M}_2)},
40+
\frac{p(m_1 \mid y)}{p(m_2 \mid y)} = \frac{p(m_1)}{p(m_2)} \cdot \frac{p(y\mid m_1)}{p(y\mid m_2)},
4141
$$
4242

4343
or equivalently by \eqref{eq:BF},
4444

4545
$$ \label{eq:odds2}
46-
\frac{p(\mathcal{M}_1\mid y)}{p(\mathcal{M}_2\mid y)} = \frac{p(\mathcal{M}_1)}{p(\mathcal{M}_2)} \cdot \text{BF}_{12}.
46+
\frac{p(m_1 \mid y)}{p(m_2 \mid y)} = \frac{p(m_1)}{p(m_2)} \cdot \text{BF}_{12}.
4747
$$
4848

49-
In other words, the Bayes factor can be viewed as the factor by which the prior model odds are updated (after observing data $y$) to posterior model odds.
49+
In other words, the Bayes factor can be viewed as the factor by which the prior model odds are updated (after observing data $y$) to posterior model odds (see also [Bayes' rule](/P/bayes-rule)).

D/em.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
layout: definition
3+
mathjax: true
4+
5+
author: "Thomas J. Faulkenberry"
6+
affiliation: "Tarleton State University"
7+
e_mail: "faulkenberry@tarleton.edu"
8+
date: 2020-08-26 12:00:00
9+
10+
title: "Encompassing model"
11+
chapter: "Model Selection"
12+
section: "Bayesian model selection"
13+
topic: "Bayes factor"
14+
definition: "Definition"
15+
16+
sources:
17+
- authors: "Klugkist, I., Kato, B., and Hoijtink, H."
18+
year: 2005
19+
title: "Bayesian model selection using encompassing priors"
20+
in: "Statistica Neerlandica"
21+
pages: "vol. 59, no. 1., pp. 57-69"
22+
url: "https://dx.doi.org/10.1111/j.1467-9574.2005.00279.x"
23+
doi: "10.1111/j.1467-9574.2005.00279.x"
24+
25+
def_id: "D93"
26+
shortcut: "em"
27+
username: "tomfaulkenberry"
28+
---
29+
30+
31+
**Definition:** Consider a family $f$ of [generative models](/D/gm) $m$ on data $y$, where each $m \in f$ is defined by placing an inequality constraint on model parameter(s) $\theta$ (e.g., $m:\theta>0$. Then the encompassing model $m_e$ is constructed such that each $m$ is nested within $m_e$ and all inequality constraints on the parameter(s) $\theta$ are removed.

P/bf-ep.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Thomas J. Faulkenberry"
6+
affiliation: "Tarleton State University"
7+
e_mail: "faulkenberry@tarleton.edu"
8+
date: 2020-08-26 12:00:00
9+
10+
title: "Encompassing Prior Method for computing Bayes Factors"
11+
chapter: "Model Selection"
12+
section: "Bayesian model selection"
13+
topic: "Bayes factor"
14+
theorem: "Computation using Encompassing Prior Method"
15+
16+
sources:
17+
- authors: "Klugkist, I., Kato, B., and Hoijtink, H."
18+
year: 2005
19+
title: "Bayesian model selection using encompassing priors"
20+
in: "Statistica Neerlandica"
21+
pages: "vol. 59, no. 1., pp. 57-69"
22+
url: "https://dx.doi.org/10.1111/j.1467-9574.2005.00279.x"
23+
doi: "10.1111/j.1467-9574.2005.00279.x"
24+
25+
- authors: "Faulkenberry, Thomas J."
26+
year: 2019
27+
title: "A tutorial on generalizing the default Bayesian t-test via posterior sampling and encompassing priors"
28+
in: "Communications for Statistical Applications and Methods"
29+
pages: "vol. 26, no. 2, pp. 217-238"
30+
url: "https://dx.doi.org/10.29220/CSAM.2019.26.2.217"
31+
doi: "10.29220/CSAM.2019.26.2.217"
32+
33+
proof_id: "P157"
34+
shortcut: "bf-ep"
35+
username: "tomfaulkenberry"
36+
---
37+
38+
39+
**Theorem:** Consider two models $m_1$ and $m_e$, where $m_1$ is nested within an [encompassing model](/D/em) $m_e$ via an inequality constraint on some parameter $\theta$, and $\theta$ is unconstrained under $m_e$. Then
40+
\[
41+
B_{1e} = \frac{c}{d} = \frac{1/d}{1/c}
42+
\]
43+
where $1/d$ and $1/c$ represent the proportions of the posterior and prior of the encompassing model, respectively, that are in agreement with the inequality constraint imposed by the nested model $m_1$.
44+
45+
**Proof:**
46+
Consider first that for any model $m_1$ on data $y$ with parameter $\theta$, [Bayes theorem](/P/bayes-th) implies
47+
48+
$$ \label{eq:bayesth}
49+
p(\theta \mid y,m_1) = \frac{p(y \mid \theta,m_1) \cdot p(\theta \mid m_1)}{p(y \mid m_1)}.
50+
$$
51+
52+
Rearranging Equation \eqref{eq:bayesth} allows us to write the [marginal likelihood](/D/ml) for $y$ under $m_1$ as
53+
54+
$$ \label{eq:marginal}
55+
p(y \mid m_1) = \frac{p(y \mid \theta,m_1) \cdot p(\theta \mid m_1)}{p(\theta \mid y,m_1)}.
56+
$$
57+
58+
Taking the ratio of the marginal likelihoods for $m_1$ and the [encompassing model](/D/em) $m_e$ yields the following [Bayes factor](/D/bf):
59+
60+
$$ \label{eq:bayesfactor}
61+
B_{1e} = \frac{p(y \mid \theta,m_1) \cdot p(\theta \mid m_1) / p(\theta \mid y,m_1)}{p(y \mid \theta,m_e) \cdot p(\theta \mid m_e) / p(\theta \mid y,m_e)}.
62+
$$
63+
64+
Now, both the constrained model $m_1$ and the [encompassing model](/D/em) $m_e$ contain the same parameter vector $\theta$. Choose a specific value of $\theta$, say $\theta'$, that exists in the support of both models $m_1$ and $m_e$ (we can do this because $m_1$ is nested within $m_e$). Then, for this parameter value $\theta'$, we have $p(y \mid \theta',m_1)=p(y \mid \theta',m_e)$, so the expression for the Bayes factor (Equation \eqref{eq:bayesfactor} above) reduces to an expression involving only the priors and posteriors for $\theta'$ under $m_1$ and $m_e$:
65+
66+
$$ \label{eq:bayesfactor2}
67+
B_{1e} = \frac{p(\theta' \mid m_1) / p(\theta' \mid y,m_1)}{p(\theta' \mid m_e) / p(\theta' \mid y,m_e)}.
68+
$$
69+
70+
Because $m_1$ is nested within $m_e$ via an inequality constraint, the prior $p(\theta' \mid m_1)$ is simply a truncation of the encompassing prior $p(\theta' \mid m_e)$. Thus, we can express $p(\theta' \mid m_1)$ in terms of the encompassing prior $p(\theta' \mid m_e)$ by multiplying the encompassing prior by an indicator function over $m_1$ and then normalizing the resulting product. That is,
71+
72+
$$ \label{eq:normalize}
73+
\begin{split}
74+
p(\theta' \mid m_1) & = \frac{p(\theta' \mid m_e) \cdot I_{\theta' \in m_1}}{\int p(\theta' \mid m_e) \cdot I_{\theta' \in m_1}d\theta'}\\
75+
& = \Biggl(\frac{I_{\theta' \in m_1}}{\int p(\theta' \mid m_e) \cdot I_{\theta' \in m_1}d\theta'}\Biggr) \cdot p(\theta' \mid m_e),
76+
\end{split}
77+
$$
78+
79+
where $I_{\theta' \in m_1}$ is an indicator function. For parameters $\theta' \in m_1$, this indicator function is identically equal to 1, so the expression in parentheses reduces to a constant, say $c$, allowing us to write the prior as
80+
81+
$$ \label{eq:prior}
82+
p(\theta' \mid m_1) = c \cdot p(\theta' \mid m_e).
83+
$$
84+
85+
By similar reasoning, we can write the posterior as
86+
87+
$$ \label{eq:posterior}
88+
p(\theta' \mid y,m_1) = \Biggl(\frac{I_{\theta' \in m_1}}{\int p(\theta' \mid y,m_e)I_{\theta' \in m_1}d\theta'}\Biggr)\cdot p(\theta' \mid y,m_e) = d \cdot p(\theta' \mid y,m_e).
89+
$$
90+
91+
This gives us
92+
93+
$$ \label{eq:bayesfactor3}
94+
B_{1e} = \frac{c \cdot p(\theta' \mid m_e) / d \cdot p(\theta' \mid y,m_e)}{p(\theta' \mid m_e) / p(\theta' \mid y,m_e)} = \frac{c}{d} = \frac{1/d}{1/c},
95+
$$
96+
97+
which completes the proof. Note that by definition, $1/d$ represents the proportion of the posterior distribution for $\theta$ under the [encompassing model](/D/em) $m_e$ that agrees with the constraints imposed by $m_1$. Similarly, $1/c$ represents the proportion of the prior distribution for $\theta$ under the [encompassing model](/D/em) $m_e$ that agrees with the constraints imposed by $m_1$.

P/bf-sddr.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -28,46 +28,46 @@ username: "tomfaulkenberry"
2828
---
2929

3030

31-
**Theorem:** Consider two competing [models](/D/gm) on data $y$ containing parameters $\delta$ and $\varphi$, namely $\mathcal{M}_0:\delta=\delta_0,\varphi$ and $\mathcal{M}_1:\delta,\varphi$. In this context, we say that $\delta$ is a parameter of interest, $\varphi$ is a nuisance parameter (i.e., common to both models), and $\mathcal{M}_0$ is a sharp point hypothesis nested within $\mathcal{M}_1$. Suppose further that the prior for the nuisance parameter $\varphi$ in $\mathcal{M}_0$ is equal to the prior for $\varphi$ in $\mathcal{M}_1$ after conditioning on the restriction -- that is, $p(\varphi\mid \mathcal{M}_0) = p(\varphi\mid \delta=\delta_0,\mathcal{M}_1)$. Then the [Bayes factor](/D/bf) for $\mathcal{M}_0$ over $\mathcal{M}_1$ can be computed as:
31+
**Theorem:** Consider two competing [models](/D/gm) on data $y$ containing parameters $\delta$ and $\varphi$, namely $m_0:\delta=\delta_0,\varphi$ and $m_1:\delta,\varphi$. In this context, we say that $\delta$ is a parameter of interest, $\varphi$ is a nuisance parameter (i.e., common to both models), and $m_0$ is a sharp point hypothesis nested within $m_1$. Suppose further that the prior for the nuisance parameter $\varphi$ in $m_0$ is equal to the prior for $\varphi$ in $m_1$ after conditioning on the restriction -- that is, $p(\varphi\mid m_0) = p(\varphi\mid \delta=\delta_0,m_1)$. Then the [Bayes factor](/D/bf) for $m_0$ over $m_1$ can be computed as:
3232

3333
$$ \label{eq:sd}
34-
\text{BF}_{01} = \frac{p(\delta=\delta_0\mid y,\mathcal{M}_1)}{p(\delta=\delta_0\mid \mathcal{M}_1)}.
34+
\text{BF}_{01} = \frac{p(\delta=\delta_0\mid y,m_1)}{p(\delta=\delta_0\mid m_1)}.
3535
$$
3636

3737
**Proof:**
3838

39-
By [definition](/D/bf), the Bayes factor $\text{BF}_{01}$ is the ratio of marginal likelihoods of data $y$ over $\mathcal{M}_0$ and $\mathcal{M}_1$, respectively. That is,
39+
By [definition](/D/bf), the Bayes factor $\text{BF}_{01}$ is the ratio of marginal likelihoods of data $y$ over $m_0$ and $m_1$, respectively. That is,
4040

4141
$$ \label{eq:bf}
42-
\text{BF}_{01}=\frac{p(y \mid \mathcal{M}_0)}{p(y \mid \mathcal{M}_1)}.
42+
\text{BF}_{01}=\frac{p(y \mid m_0)}{p(y \mid m_1)}.
4343
$$
4444

45-
The key idea in the proof is that we can use a "change of variables" technique to express $\text{BF}_{01}$ entirely in terms of the "encompassing" model $\mathcal{M}_1$. This proceeds by first unpacking the [marginal likelihood](/D/ml) for $\mathcal{M}_0$ over the nuisance parameter $\varphi$ and then using the fact that $\mathcal{M}_0$ is a sharp hypothesis nested within $\mathcal{M}_1$ to rewrite everything in terms of $\mathcal{H}_1$. Specifically,
45+
The key idea in the proof is that we can use a "change of variables" technique to express $\text{BF}_{01}$ entirely in terms of the "encompassing" model $m_1$. This proceeds by first unpacking the [marginal likelihood](/D/ml) for $m_0$ over the nuisance parameter $\varphi$ and then using the fact that $m_0$ is a sharp hypothesis nested within $m_1$ to rewrite everything in terms of $\mathcal{H}_1$. Specifically,
4646

4747
$$
48-
\begin{aligned}
49-
p(y \mid \mathcal{M}_0) &= \int p(y \mid \varphi,\mathcal{M}_0)p(\varphi\mid \mathcal{M}_0)d\varphi\\
50-
&= \int p(y \mid \varphi,\delta=\delta_0,\mathcal{M}_1)p(\varphi\mid \delta=\delta_0,\mathcal{M}_1)d\varphi\\
51-
&= p(y \mid \delta=\delta_0,\mathcal{M}_1).\\
52-
\end{aligned}
48+
\begin{split}
49+
p(y \mid m_0) &= \int p(y \mid \varphi,m_0)p(\varphi\mid m_0)d\varphi\\
50+
&= \int p(y \mid \varphi,\delta=\delta_0,m_1)p(\varphi\mid \delta=\delta_0,m_1)d\varphi\\
51+
&= p(y \mid \delta=\delta_0,m_1).\\
52+
\end{split}
5353
$$
5454

5555
By [Bayes Theorem](/P/bayes-th), we can rewrite this last line as
5656

5757
$$
58-
p(y \mid \delta=\delta_0,\mathcal{M}_1) = \frac{p(\delta=\delta_0\mid y,\mathcal{M}_1)p(y \mid \mathcal{M}_1)}{p(\delta=\delta_0\mid \mathcal{M}_1)}.
58+
p(y \mid \delta=\delta_0,m_1) = \frac{p(\delta=\delta_0\mid y,m_1)p(y \mid m_1)}{p(\delta=\delta_0\mid m_1)}.
5959
$$
6060

6161
Thus we have
6262

6363
$$
64-
\begin{aligned}
65-
\text{BF}_{01} &= \frac{p(y \mid \mathcal{M}_0)}{p(y \mid \mathcal{M}_1)}\\
66-
&= p(y \mid \mathcal{M}_0) \cdot \frac{1}{p(y \mid \mathcal{M}_1)}\\
67-
&= p(y \mid \delta=\delta_0,\mathcal{M}_1) \cdot \frac{1}{p(y \mid \mathcal{M}_1)}\\
68-
&= \frac{p(\delta=\delta_0\mid y,\mathcal{M}_1)p(y \mid \mathcal{M}_1)}{p(\delta=\delta_0\mid \mathcal{M}_1)} \cdot \frac{1}{p(y \mid \mathcal{M}_1)}\\
69-
&=\frac{p(\delta=\delta_0 \mid y,\mathcal{M}_1)}{p(\delta=\delta_0\mid \mathcal{M}_1)},
70-
\end{aligned}
64+
\begin{split}
65+
\text{BF}_{01} &= \frac{p(y \mid m_0)}{p(y \mid m_1)}\\
66+
&= p(y \mid m_0) \cdot \frac{1}{p(y \mid m_1)}\\
67+
&= p(y \mid \delta=\delta_0,m_1) \cdot \frac{1}{p(y \mid m_1)}\\
68+
&= \frac{p(\delta=\delta_0\mid y,m_1)p(y \mid m_1)}{p(\delta=\delta_0\mid m_1)} \cdot \frac{1}{p(y \mid m_1)}\\
69+
&=\frac{p(\delta=\delta_0 \mid y,m_1)}{p(\delta=\delta_0\mid m_1)},
70+
\end{split}
7171
$$
7272

7373
which completes the proof of \eqref{eq:sd}.

0 commit comments

Comments
 (0)