Skip to content

Commit e4d99cf

Browse files
committed
corrected some pages
Several small corrections were done to several proofs and definitions.
1 parent 376a79a commit e4d99cf

9 files changed

Lines changed: 17 additions & 33 deletions

File tree

D/prob-marg.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,4 @@ $$ \label{eq:prob-marg-cont}
4747
p(A) = \int_{\mathcal{X}} p(A,x) \, \mathrm{d}x \; .
4848
$$
4949

50-
The law of marginal probability can be motivated from the [law of total probability](/D/prob-tot) which follows from the [Kolmogorov axioms of probability](/D/prob-ax).
50+
The law of marginal probability can be motivated from the [law of total probability](/P/prob-tot) which follows from the [Kolmogorov axioms of probability](/D/prob-ax).

D/vb.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ username: "JoramSoch"
4747
---
4848

4949

50-
**Definition:** Let $m$ be a [generative model](/D/gm) with model parameters $\theta$ implying the [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Then, a Variational Bayes treatment of $m$, also referred to as "approximate inference" or "variational inference", consists in
50+
**Definition:** Let $m$ be a [generative model](/D/gm) with [model parameters](/D/para) $\theta \in \Theta$ implying the [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Then, a Variational Bayes treatment of $m$, also referred to as "approximate inference" or "variational inference", consists in
5151

5252
<br>
5353
1) constructing an approximate [posterior distribution](/D/post)
@@ -60,14 +60,14 @@ $$
6060
2) evaluating the [variational free energy](/D/vblme)
6161

6262
$$ \label{eq:FE}
63-
F_q(m) = \int q(\theta) \log p(y|\theta,m) \, \mathrm{d}\theta - \int q(\theta) \frac{q(\theta)}{p(\theta|m)} \, \mathrm{d}\theta
63+
\mathrm{F}_m[q(\theta)] = \int_{\Theta} q(\theta) \log \frac{p(\theta \vert y, m)}{q(\theta)} \, \mathrm{d}\theta
6464
$$
6565

6666
<br>
6767
3) and maximizing this function with respect to $q(\theta)$
6868

6969
$$ \label{eq:VB}
70-
\hat{q}(\theta) = \operatorname*{arg\,max}_{q} F_q(m) \; .
70+
\hat{q}(\theta) = \operatorname*{arg\,max}_{q} \mathrm{F}_m[q(\theta)]
7171
$$
7272

7373
for [Bayesian inference](/P/bayes-th), i.e. obtaining the [posterior distribution](/D/post) (from eq. \eqref{eq:VB}) and approximating the [marginal likelihood](/D/ml) (by plugging eq. \eqref{eq:VB} into eq. \eqref{eq:FE}).

D/vblme.md

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -34,20 +34,10 @@ username: "JoramSoch"
3434
---
3535

3636

37-
**Definition:** Let $m$ be a [generative model](/D/gm) with model parameters $\theta$ implying the [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Moreover, assume an [approximate](/D/vb) [posterior distribution](/D/post) $q(\theta)$. Then, the [Variational Bayesian](/D/vb) [log model evidence](/D/lme), also referred to as the "negative free energy", is the expectation of the [log-likelihood function](/D/llf) with respect to the approximate posterior, minus the [Kullback-Leibler divergence](/D/kl) between approximate posterior and the prior distribution:
37+
**Definition:** Let $m$ be a [generative model](/D/gm) with [model parameters](/D/para) $\theta \in \Theta$ implying the [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Moreover, assume an [approximate](/D/vb) [posterior distribution](/D/post) $q(\theta)$. Then, the [Variational Bayesian](/D/vb) [log model evidence](/D/lme), also referred to as the "variational free energy", is defined as the expected logarithm of the likelihood function, divided by the approximate posterior:
3838

3939
$$ \label{eq:vbLME}
40-
\mathrm{vbLME}(m) = \left\langle \log p(y \vert \theta, m) \right\rangle_{q(\theta)} - \mathrm{KL}\left[q(\theta) || p(\theta \vert m)\right]
40+
\mathrm{vbLME}(m) = \mathrm{F}_m[q(\theta)] = \int_{\Theta} q(\theta) \log \frac{p(\theta \vert y, m)}{q(\theta)} \, \mathrm{d}\theta \; .
4141
$$
4242

43-
where
44-
45-
$$ \label{eq:ELL}
46-
\left\langle \log p(y \vert \theta, m) \right\rangle_{q(\theta)} = \int q(\theta) \log p(y \vert \theta, m) \, \mathrm{d}\theta
47-
$$
48-
49-
and
50-
51-
$$ \label{eq:KL}
52-
\mathrm{KL}\left[q(\theta) || p(\theta \vert m)\right] = \int q(\theta) \log \frac{q(\theta)}{p(\theta \vert m)} \, \mathrm{d}\theta \; .
53-
$$
43+
The variational free energy can be decomposed into the [difference between log model evidence and KL divergence of approximate from true posterior](/P/fren-dec) or, alternatively, as the [difference of expected log-likelihood and KL divergence of approximate posterior from prior](/P/fren-dec).

P/entcross-conv.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,13 @@ username: "JoramSoch"
3333
---
3434

3535

36-
**Theorem:** The [cross-entropy](/D/ent-cross) is convex in the [probability distribution](/D/dist) $q$, i.e.
36+
**Theorem:** The [cross-entropy](/D/ent-cross) is convex in the second [probability distribution](/D/dist), i.e.
3737

3838
$$ \label{eq:ent-cross-conv}
3939
\mathrm{H}[p,\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{H}[p,q_1] + (1-\lambda) \mathrm{H}[p,q_2]
4040
$$
4141

42-
where $p$ is a fixed and $q_1$ and $q_2$ are any two probability distributions and $0 \leq \lambda \leq 1$.
42+
where $p$ is a fixed and $q_1$ and $q_2$ are any two [probability mass functions](/D/pmf) and $0 \leq \lambda \leq 1$.
4343

4444

4545
**Proof:** The [relationship between Kullback-Leibler divergence, entropy and cross-entropy](/P/kl-ent) is:
@@ -51,7 +51,7 @@ $$
5151
Note that the [KL divergence is convex](/P/kl-conv) in the pair of [probability distributions](/D/dist) $(p,q)$:
5252

5353
$$ \label{eq:kl-conv}
54-
\mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2]
54+
\mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2] \; .
5555
$$
5656

5757
A special case of this is given by

P/fren-dec.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ $$ \label{eq:vb-fe3}
4848
\mathrm{F}[q(\theta)] = \left\langle \log p(y,\theta) \right\rangle_{q(\theta)} - \mathrm{h}[q(\theta)]
4949
$$
5050

51-
where $p(y \vert m) = p(y)$ is the [marginal likelihood](/D/ml), $\left\langle \cdot \right\rangle_{p(x)}$ denotes an [expectation](/D/mean) with respect to the []density](/D/pdf) $p(x)$, $\mathrm{KL}[\cdot \vert\vert \cdot]$ denotes the [Kullback-Leibler divergence](/D/kl) and $\mathrm{h}[\cdot]$ denotes the [differential entropy](/D/dent).
51+
where $p(y \vert m) = p(y)$ is the [marginal likelihood](/D/ml), $\left\langle \cdot \right\rangle_{p(x)}$ denotes an [expectation](/D/mean) with respect to the [density](/D/pdf) $p(x)$, $\mathrm{KL}[\cdot \vert\vert \cdot]$ denotes the [Kullback-Leibler divergence](/D/kl) and $\mathrm{h}[\cdot]$ denotes the [differential entropy](/D/dent).
5252

5353

5454
**Proof:** The [log model evidence](/D/lme) is defined as

P/glm-llrmi.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ $$ \label{eq:m0}
4040
m_0: \; Y = E_0, \; E_0 \sim \mathcal{MN}(0, I_n, \Sigma_0) \; .
4141
$$
4242

43-
Then, the [log-likelihood ratio](/D/llr) of $m_1$ vs. $m_0$ is equal to the estimated [mutual information](/D/mi) of $X$ and $Y$:
43+
Then, the [log-likelihood ratio](/D/llr) of $m_1$ vs. $m_0$ is equal to the [estimated](/D/est) [mutual information](/D/mi) of $X$ and $Y$:
4444

4545
$$ \label{eq:glm-llrmi}
4646
\ln \Lambda_{10} = \hat{I}(X,Y) \; .

P/glm-mi.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Since $X$ is [constant](/D/const) and thus only has [one possible value](/D/samp
6363
$$ \label{eq:dent-cond-const}
6464
\begin{split}
6565
\mathrm{h}(Y|X)
66-
&= \int_{z \in \mathcal{X}} p(z) \cdot \mathrm{h}(Y|z) \, \mathrm{d}z \\
66+
&= \int_{x \in \mathcal{X}} p(x) \cdot \mathrm{h}(Y|x) \, \mathrm{d}x \\
6767
&= p(X) \cdot \mathrm{h}(Y|X) \\
6868
&= \mathrm{h}\left[ p(Y|X,B,\Sigma_1) \right] \; .
6969
\end{split}

P/kl-conv.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,13 @@ username: "JoramSoch"
3333
---
3434

3535

36-
**Theorem:** The [Kullback-Leibler divergence](/D/kl) is convex in the pair of [probability distributions](/D/dist) $(p,q)$, i.e.
36+
**Theorem:** The [Kullback-Leibler divergence](/D/kl) is convex in pairs of [probability distributions](/D/dist), i.e.
3737

3838
$$ \label{eq:KL-conv}
3939
\mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2]
4040
$$
4141

42-
where $(p_1,q_1)$ and $(p_2,q_2)$ are two pairs of probability distributions and $0 \leq \lambda \leq 1$.
42+
where $(p_1,q_1)$ and $(p_2,q_2)$ are two pairs of [probability density functions](/D/pdf) and $0 \leq \lambda \leq 1$.
4343

4444

4545
**Proof:** The [Kullback-Leibler divergence](/D/kl) of $P$ from $Q$ is defined as
@@ -56,7 +56,7 @@ $$
5656

5757
where $a_1, \ldots, a_n$ and $b_1, \ldots, b_n$ are non-negative real numbers.
5858

59-
Thus, we can rewrite the KL divergence of the mixture distribution as
59+
Thus, we can rewrite the KL divergence of the [mixture distribution](/D/dist-mixt) as
6060

6161
$$ \label{eq:KL-conv-qed}
6262
\begin{split}

P/logsum-ineq.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,6 @@ sources:
2020
in: "Wikipedia, the free encyclopedia"
2121
pages: "retrieved on 2020-09-09"
2222
url: "https://en.wikipedia.org/wiki/Log_sum_inequality#Proof"
23-
- authors: "Wikipedia"
24-
year: 2020
25-
title: "Jensen's inequality"
26-
in: "Wikipedia, the free encyclopedia"
27-
pages: "retrieved on 2020-09-09"
28-
url: "https://en.wikipedia.org/wiki/Jensen%27s_inequality#Statements"
2923

3024
proof_id: "P165"
3125
shortcut: "logsum-ineq"
@@ -64,7 +58,7 @@ $$ \label{eq:sum-bi-b}
6458
\end{split}
6559
$$
6660

67-
applying Jensen's inequality yields
61+
applying [Jensen's inequality](/P/jens-ineq) yields
6862

6963
$$ \label{eq:logsum-ineq-s3}
7064
\begin{split}

0 commit comments

Comments
 (0)