corrected some pages

JoramSoch · JoramSoch · commit e4d99cf705ca · 2025-09-30T10:15:12.000+02:00
Several small corrections were done to several proofs and definitions.
diff --git a/D/prob-marg.md b/D/prob-marg.md
@@ -47,4 +47,4 @@ $$ \label{eq:prob-marg-cont}
 p(A) = \int_{\mathcal{X}} p(A,x) \, \mathrm{d}x \; .
 $$
 
-The law of marginal probability can be motivated from the [law of total probability](/D/prob-tot) which follows from the [Kolmogorov axioms of probability](/D/prob-ax).
+The law of marginal probability can be motivated from the [law of total probability](/P/prob-tot) which follows from the [Kolmogorov axioms of probability](/D/prob-ax).
diff --git a/D/vb.md b/D/vb.md
@@ -47,7 +47,7 @@ username: "JoramSoch"
 ---
 
 
-**Definition:** Let $m$ be a [generative model](/D/gm) with model parameters $\theta$ implying the [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Then, a Variational Bayes treatment of $m$, also referred to as "approximate inference" or "variational inference", consists in
+**Definition:** Let $m$ be a [generative model](/D/gm) with [model parameters](/D/para) $\theta \in \Theta$ implying the [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Then, a Variational Bayes treatment of $m$, also referred to as "approximate inference" or "variational inference", consists in
 
 <br>
 1) constructing an approximate [posterior distribution](/D/post)
@@ -60,14 +60,14 @@ $$
 2) evaluating the [variational free energy](/D/vblme)
 
 $$ \label{eq:FE}
-F_q(m) = \int q(\theta) \log p(y|\theta,m) \, \mathrm{d}\theta - \int q(\theta) \frac{q(\theta)}{p(\theta|m)} \, \mathrm{d}\theta
+\mathrm{F}_m[q(\theta)] = \int_{\Theta} q(\theta) \log \frac{p(\theta \vert y, m)}{q(\theta)} \, \mathrm{d}\theta
 $$
 
 <br>
 3) and maximizing this function with respect to $q(\theta)$
 
 $$ \label{eq:VB}
-\hat{q}(\theta) = \operatorname*{arg\,max}_{q} F_q(m) \; .
+\hat{q}(\theta) = \operatorname*{arg\,max}_{q} \mathrm{F}_m[q(\theta)]
 $$
 
 for [Bayesian inference](/P/bayes-th), i.e. obtaining the [posterior distribution](/D/post) (from eq. \eqref{eq:VB}) and approximating the [marginal likelihood](/D/ml) (by plugging eq. \eqref{eq:VB} into eq. \eqref{eq:FE}).
diff --git a/D/vblme.md b/D/vblme.md
@@ -34,20 +34,10 @@ username: "JoramSoch"
 ---
 
 
-**Definition:** Let $m$ be a [generative model](/D/gm) with model parameters $\theta$ implying the [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Moreover, assume an [approximate](/D/vb) [posterior distribution](/D/post) $q(\theta)$. Then, the [Variational Bayesian](/D/vb) [log model evidence](/D/lme), also referred to as the "negative free energy", is the expectation of the [log-likelihood function](/D/llf) with respect to the approximate posterior, minus the [Kullback-Leibler divergence](/D/kl) between approximate posterior and the prior distribution:
+**Definition:** Let $m$ be a [generative model](/D/gm) with [model parameters](/D/para) $\theta \in \Theta$ implying the [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Moreover, assume an [approximate](/D/vb) [posterior distribution](/D/post) $q(\theta)$. Then, the [Variational Bayesian](/D/vb) [log model evidence](/D/lme), also referred to as the "variational free energy", is defined as the expected logarithm of the likelihood function, divided by the approximate posterior:
 
 $$ \label{eq:vbLME}
-\mathrm{vbLME}(m) = \left\langle \log p(y \vert \theta, m) \right\rangle_{q(\theta)} - \mathrm{KL}\left[q(\theta) || p(\theta \vert m)\right]
+\mathrm{vbLME}(m) = \mathrm{F}_m[q(\theta)] = \int_{\Theta} q(\theta) \log \frac{p(\theta \vert y, m)}{q(\theta)} \, \mathrm{d}\theta \; .
 $$
 
-where
-
-$$ \label{eq:ELL}
-\left\langle \log p(y \vert \theta, m) \right\rangle_{q(\theta)} = \int q(\theta) \log p(y \vert \theta, m) \, \mathrm{d}\theta
-$$
-
-and
-
-$$ \label{eq:KL}
-\mathrm{KL}\left[q(\theta) || p(\theta \vert m)\right] = \int q(\theta) \log \frac{q(\theta)}{p(\theta \vert m)} \, \mathrm{d}\theta  \; .
-$$
+The variational free energy can be decomposed into the [difference between log model evidence and KL divergence of approximate from true posterior](/P/fren-dec) or, alternatively, as the [difference of expected log-likelihood and KL divergence of approximate posterior from prior](/P/fren-dec).
diff --git a/P/entcross-conv.md b/P/entcross-conv.md
@@ -33,13 +33,13 @@ username: "JoramSoch"
 ---
 
 
-**Theorem:** The [cross-entropy](/D/ent-cross) is convex in the [probability distribution](/D/dist) $q$, i.e.
+**Theorem:** The [cross-entropy](/D/ent-cross) is convex in the second [probability distribution](/D/dist), i.e.
 
 $$ \label{eq:ent-cross-conv}
 \mathrm{H}[p,\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{H}[p,q_1] + (1-\lambda) \mathrm{H}[p,q_2]
 $$
 
-where $p$ is a fixed and $q_1$ and $q_2$ are any two probability distributions and $0 \leq \lambda \leq 1$.
+where $p$ is a fixed and $q_1$ and $q_2$ are any two [probability mass functions](/D/pmf) and $0 \leq \lambda \leq 1$.
 
 
 **Proof:** The [relationship between Kullback-Leibler divergence, entropy and cross-entropy](/P/kl-ent) is:
@@ -51,7 +51,7 @@ $$
 Note that the [KL divergence is convex](/P/kl-conv) in the pair of [probability distributions](/D/dist) $(p,q)$:
 
 $$ \label{eq:kl-conv}
-\mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2]
+\mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2] \; .
 $$
 
 A special case of this is given by
diff --git a/P/fren-dec.md b/P/fren-dec.md
@@ -48,7 +48,7 @@ $$ \label{eq:vb-fe3}
 \mathrm{F}[q(\theta)] = \left\langle \log p(y,\theta) \right\rangle_{q(\theta)} - \mathrm{h}[q(\theta)]
 $$
 
-where $p(y \vert m) = p(y)$ is the [marginal likelihood](/D/ml), $\left\langle \cdot \right\rangle_{p(x)}$ denotes an [expectation](/D/mean) with respect to the []density](/D/pdf) $p(x)$, $\mathrm{KL}[\cdot \vert\vert \cdot]$ denotes the [Kullback-Leibler divergence](/D/kl) and $\mathrm{h}[\cdot]$ denotes the [differential entropy](/D/dent).
+where $p(y \vert m) = p(y)$ is the [marginal likelihood](/D/ml), $\left\langle \cdot \right\rangle_{p(x)}$ denotes an [expectation](/D/mean) with respect to the [density](/D/pdf) $p(x)$, $\mathrm{KL}[\cdot \vert\vert \cdot]$ denotes the [Kullback-Leibler divergence](/D/kl) and $\mathrm{h}[\cdot]$ denotes the [differential entropy](/D/dent).
 
 
 **Proof:** The [log model evidence](/D/lme) is defined as
diff --git a/P/glm-llrmi.md b/P/glm-llrmi.md
@@ -40,7 +40,7 @@ $$ \label{eq:m0}
 m_0: \; Y = E_0, \; E_0 \sim \mathcal{MN}(0, I_n, \Sigma_0) \; .
 $$
 
-Then, the [log-likelihood ratio](/D/llr) of $m_1$ vs. $m_0$ is equal to the estimated [mutual information](/D/mi) of $X$ and $Y$:
+Then, the [log-likelihood ratio](/D/llr) of $m_1$ vs. $m_0$ is equal to the [estimated](/D/est) [mutual information](/D/mi) of $X$ and $Y$:
 
 $$ \label{eq:glm-llrmi}
 \ln \Lambda_{10} = \hat{I}(X,Y) \; .
diff --git a/P/glm-mi.md b/P/glm-mi.md
@@ -63,7 +63,7 @@ Since $X$ is [constant](/D/const) and thus only has [one possible value](/D/samp
 $$ \label{eq:dent-cond-const}
 \begin{split}
 \mathrm{h}(Y|X)
-&= \int_{z \in \mathcal{X}} p(z) \cdot \mathrm{h}(Y|z) \, \mathrm{d}z \\
+&= \int_{x \in \mathcal{X}} p(x) \cdot \mathrm{h}(Y|x) \, \mathrm{d}x \\
 &= p(X) \cdot \mathrm{h}(Y|X) \\
 &= \mathrm{h}\left[ p(Y|X,B,\Sigma_1) \right] \; .
 \end{split}
diff --git a/P/kl-conv.md b/P/kl-conv.md
@@ -33,13 +33,13 @@ username: "JoramSoch"
 ---
 
 
-**Theorem:**  The [Kullback-Leibler divergence](/D/kl) is convex in the pair of [probability distributions](/D/dist) $(p,q)$, i.e.
+**Theorem:**  The [Kullback-Leibler divergence](/D/kl) is convex in pairs of [probability distributions](/D/dist), i.e.
 
 $$ \label{eq:KL-conv}
 \mathrm{KL}[\lambda p_1 + (1-\lambda) p_2||\lambda q_1 + (1-\lambda) q_2] \leq \lambda \mathrm{KL}[p_1||q_1] + (1-\lambda) \mathrm{KL}[p_2||q_2]
 $$
 
-where $(p_1,q_1)$ and $(p_2,q_2)$ are two pairs of probability distributions and $0 \leq \lambda \leq 1$.
+where $(p_1,q_1)$ and $(p_2,q_2)$ are two pairs of [probability density functions](/D/pdf) and $0 \leq \lambda \leq 1$.
 
 
 **Proof:** The [Kullback-Leibler divergence](/D/kl) of $P$ from $Q$ is defined as
@@ -56,7 +56,7 @@ $$
 
 where $a_1, \ldots, a_n$ and $b_1, \ldots, b_n$ are non-negative real numbers.
 
-Thus, we can rewrite the KL divergence of the mixture distribution as
+Thus, we can rewrite the KL divergence of the [mixture distribution](/D/dist-mixt) as
 
 $$ \label{eq:KL-conv-qed}
 \begin{split}
diff --git a/P/logsum-ineq.md b/P/logsum-ineq.md
@@ -20,12 +20,6 @@ sources:
     in: "Wikipedia, the free encyclopedia"
     pages: "retrieved on 2020-09-09"
     url: "https://en.wikipedia.org/wiki/Log_sum_inequality#Proof"
-  - authors: "Wikipedia"
-    year: 2020
-    title: "Jensen's inequality"
-    in: "Wikipedia, the free encyclopedia"
-    pages: "retrieved on 2020-09-09"
-    url: "https://en.wikipedia.org/wiki/Jensen%27s_inequality#Statements"
 
 proof_id: "P165"
 shortcut: "logsum-ineq"
@@ -64,7 +58,7 @@ $$ \label{eq:sum-bi-b}
 \end{split}
 $$
 
-applying Jensen's inequality yields
+applying [Jensen's inequality](/P/jens-ineq) yields
 
 $$ \label{eq:logsum-ineq-s3}
 \begin{split}