Merge pull request #104 from JoramSoch/master

JoramSoch · web-flow · commit b9bcfedc4868 · 2020-12-02T18:42:56.000+01:00
added 2 proofs and 8 definitions
diff --git a/D/prior-conj.md b/D/prior-conj.md
@@ -0,0 +1,34 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-12-02 17:55:00
+
+title: "Conjugate and non-conjugate prior distribution"
+chapter: "General Theorems"
+section: "Bayesian statistics"
+topic: "Prior distributions"
+definition: "Conjugate vs. non-conjugate"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2020
+    title: "Conjugate prior"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2020-12-02"
+    url: "https://en.wikipedia.org/wiki/Conjugate_prior"
+
+def_id: "D120"
+shortcut: "prior-conj"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $m$ be a [generative model](/D/gm) with [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert m)$. Then,
+
+* the [prior distribution](/D/prior) is called "conjugate", if it, when combined with the [likelihood function](/D/lf), leads to a [posterior distribution](/D/post) that belongs to the same family of [probability distributions](/D/dist);
+
+* the prior distribution is called "non-conjugate", if this is not the case.
diff --git a/D/prior-eb.md b/D/prior-eb.md
@@ -0,0 +1,34 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-12-02 18:19:00
+
+title: "Empirical Bayes prior distribution"
+chapter: "General Theorems"
+section: "Bayesian statistics"
+topic: "Prior distributions"
+definition: "Empirical Bayes priors"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2020
+    title: "Empirical Bayes method"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2020-12-02"
+    url: "https://en.wikipedia.org/wiki/Empirical_Bayes_method#Introduction"
+
+def_id: "D122"
+shortcut: "prior-eb"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $m$ be a [generative model](/D/gm) with [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert \lambda, m)$ using [prior hyperparameters](/D/prior) $\lambda$. Let $p(y \vert \lambda, m)$ be the [marginal likelihood](/D/ml) when [integrating the parameters out of the joint likelihood](/P/ml-jl). Then, the prior distribution is called an "Empirical Bayes prior", if it maximizes the logarithmized marginal likelihood:
+
+$$ \label{eq:prior-eb}
+\lambda_{\mathrm{EB}} = \operatorname*{arg\,max}_{\lambda} \log p(y \vert \lambda, m) \; .
+$$
diff --git a/D/prior-emp.md b/D/prior-emp.md
@@ -0,0 +1,35 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-12-02 17:37:00
+
+title: "Empirical and theoretical prior distribution"
+chapter: "General Theorems"
+section: "Bayesian statistics"
+topic: "Prior distributions"
+definition: "Empirical vs. non-empirical"
+
+sources:
+  - authors: "Soch J, Allefeld C, Haynes JD"
+    year: 2016
+    title: "How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection"
+    in: "NeuroImage"
+    pages: "vol. 141, pp. 469-489, eq. 13, p. 473"
+    url: "https://www.sciencedirect.com/science/article/pii/S1053811916303615"
+    doi: "10.1016/j.neuroimage.2016.07.047"
+
+def_id: "D119"
+shortcut: "prior-emp"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $p(\theta \vert m)$ a [prior distribution](/D/prior) for the parameter $\theta$ of a [generative model](/D/gm) $m$. Then,
+
+* the distribution is called an "empirical prior", if it has been [derived from empirical data](/P/post-jl);
+
+* the distribution is called a "theoretical prior", if it was specified without regard to empirical data.
diff --git a/D/prior-flat.md b/D/prior-flat.md
@@ -0,0 +1,44 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-12-02 17:04:00
+
+title: "Flat, hard and soft prior distribution"
+chapter: "General Theorems"
+section: "Bayesian statistics"
+topic: "Prior distributions"
+definition: "Flat vs. hard vs. soft"
+
+sources:
+  - authors: "Friston et al."
+    year: 2002
+    title: "Classical and Bayesian Inference in Neuroimaging: Theory"
+    in: "NeuroImage"
+    pages: "vol. 16, iss. 2, pp. 465-483, fn. 1"
+    url: "https://www.sciencedirect.com/science/article/pii/S1053811902910906"
+    doi: "10.1006/nimg.2002.1090"
+  - authors: "Friston et al."
+    year: 2002
+    title: "Classical and Bayesian Inference in Neuroimaging: Applications"
+    in: "NeuroImage"
+    pages: "vol. 16, iss. 2, pp. 484-512, fn. 10"
+    url: "https://www.sciencedirect.com/science/article/pii/S1053811902910918"
+    doi: "10.1006/nimg.2002.1091"^
+
+def_id: "D116"
+shortcut: "prior-flat"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $p(\theta \vert m)$ a [prior distribution](/D/prior) for the parameter $\theta$ of a [generative model](/D/gm) $m$. Then,
+
+* the distribution is called a "flat prior", if its [precision](/D/prec) is zero or [variance](/D/var) is infinite;
+
+* the distribution is called a "hard prior", if its [precision](/D/prec) is infinite or [variance](/D/var) is zero;
+
+* the distribution is called a "soft prior", if its [precision](/D/prec) and [variance](/D/var) are non-zero and finite.
diff --git a/D/prior-inf.md b/D/prior-inf.md
@@ -0,0 +1,37 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-12-02 17:28:00
+
+title: "Informative and non-informative prior distribution"
+chapter: "General Theorems"
+section: "Bayesian statistics"
+topic: "Prior distributions"
+definition: "Informative vs. non-informative"
+
+sources:
+  - authors: "Soch J, Allefeld C, Haynes JD"
+    year: 2016
+    title: "How to avoid mismodelling in GLM-based fMRI data analysis: cross-validated Bayesian model selection"
+    in: "NeuroImage"
+    pages: "vol. 141, pp. 469-489, eq. 15, p. 473"
+    url: "https://www.sciencedirect.com/science/article/pii/S1053811916303615"
+    doi: "10.1016/j.neuroimage.2016.07.047"
+
+def_id: "D118"
+shortcut: "prior-inf"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $p(\theta \vert m)$ a [prior distribution](/D/prior) for the parameter $\theta$ of a [generative model](/D/gm) $m$. Then,
+
+* the distribution is called an "informative prior", if it biases the parameter towards particular values;
+
+* the distribution is called a "weakly informative prior", if it mildly [influences the posterior distribution](/P/post-jl);
+
+* the distribution is called a "non-informative prior", if it does not influence the [posterior hyperparameters](/D/post).
diff --git a/D/prior-maxent.md b/D/prior-maxent.md
@@ -0,0 +1,42 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-12-02 18:13:00
+
+title: "Maximum entropy prior distribution"
+chapter: "General Theorems"
+section: "Bayesian statistics"
+topic: "Prior distributions"
+definition: "Maximum entropy priors"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2020
+    title: "Prior probability"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2020-12-02"
+    url: "https://en.wikipedia.org/wiki/Prior_probability#Uninformative_priors"
+
+def_id: "D121"
+shortcut: "prior-maxent"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $m$ be a [generative model](/D/gm) with [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert \lambda, m)$ using [prior hyperparameters](/D/prior) $\lambda$. Then, the prior distribution is called a "maximum entropy prior", if
+
+1) when $\theta$ is a [discrete random variable](/D/rvar-disc), it maximizes the [entropy](/D/ent) of the prior [probability mass function](/D/pmf):
+
+$$ \label{eq:prior-maxent-disc}
+\lambda_{\mathrm{maxent}} = \operatorname*{arg\,max}_{\lambda} \mathrm{H}\left[ p(\theta \vert \lambda, m) \right] \; ;
+$$
+
+2) when $\theta$ is a [continuous random variable](/D/rvar-disc), it maximizes the [differential entropy](/D/dent) of the prior [probability density function](/D/pdf):
+
+$$ \label{eq:prior-maxent-cont}
+\lambda_{\mathrm{maxent}} = \operatorname*{arg\,max}_{\lambda} \mathrm{h}\left[ p(\theta \vert \lambda, m) \right] \; .
+$$
diff --git a/D/prior-ref.md b/D/prior-ref.md
@@ -0,0 +1,34 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-12-02 18:26:00
+
+title: "Reference prior distribution"
+chapter: "General Theorems"
+section: "Bayesian statistics"
+topic: "Prior distributions"
+definition: "Reference priors"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2020
+    title: "Prior probability"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2020-12-02"
+    url: "https://en.wikipedia.org/wiki/Prior_probability#Uninformative_priors"
+
+def_id: "D123"
+shortcut: "prior-ref"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $m$ be a [generative model](/D/gm) with [likelihood function](/D/lf) $p(y \vert \theta, m)$ and [prior distribution](/D/prior) $p(\theta \vert \lambda, m)$ using [prior hyperparameters](/D/prior) $\lambda$. Let $p(\theta \vert y, \lambda, m)$ be the [posterior distribution](/D/post) that is [proportional to the the joint likelihood](/P/post-jl). Then, the prior distribution is called a "reference prior", if it maximizes the [expected](/D/mean) [Kullback-Leibler divergence](/D/kl) of the posterior distribution relative to the prior distribution:
+
+$$ \label{eq:prior-ref}
+\lambda_{\mathrm{ref}} = \operatorname*{arg\,max}_{\lambda} \mathrm{KL} \left[ p(\theta \vert y, \lambda, m) \, || \, p(\theta \vert \lambda, m) \right] \; .
+$$
diff --git a/D/prior-uni.md b/D/prior-uni.md
@@ -0,0 +1,34 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2020-12-02 17:21:00
+
+title: "Uniform and non-uniform prior distribution"
+chapter: "General Theorems"
+section: "Bayesian statistics"
+topic: "Prior distributions"
+definition: "Uniform vs. non-uniform"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2020
+    title: "Lindley's paradox"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2020-11-25"
+    url: "https://en.wikipedia.org/wiki/Lindley%27s_paradox#Bayesian_approach"
+
+def_id: "D117"
+shortcut: "prior-uni"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $p(\theta \vert m)$ a [prior distribution](/D/prior) for the parameter $\theta \in \Theta$ of a [generative model](/D/gm) $m$. Then,
+
+* the distribution is called a "uniform prior", if its [density](/D/pdf) is constant over the entire parameter space $\Theta$;
+
+* the distribution is called a "non-uniform prior", if its [density](/D/pdf) is not constant over the parameter space $\Theta$.
diff --git a/I/Table_of_Contents.md b/I/Table_of_Contents.md
@@ -138,9 +138,11 @@ title: "Table of Contents"
    2.2. Differential entropy <br>
    &emsp;&ensp; 2.2.1. *[Definition](/D/dent)* <br>
    &emsp;&ensp; 2.2.2. **[Negativity](/P/dent-neg)** <br>
-   &emsp;&ensp; 2.2.3. *[Conditional differential entropy](/D/dent-cond)* <br>
-   &emsp;&ensp; 2.2.4. *[Joint differential entropy](/D/dent-joint)* <br>
-   &emsp;&ensp; 2.2.5. *[Differential cross-entropy](/D/dent-cross)* <br>
+   &emsp;&ensp; 2.2.3. **[Invariance under addition](/P/dent-inv)** <br>
+   &emsp;&ensp; 2.2.4. **[Addition upon multiplication](/P/dent-add)** <br>
+   &emsp;&ensp; 2.2.5. *[Conditional differential entropy](/D/dent-cond)* <br>
+   &emsp;&ensp; 2.2.6. *[Joint differential entropy](/D/dent-joint)* <br>
+   &emsp;&ensp; 2.2.7. *[Differential cross-entropy](/D/dent-cross)* <br>
    
    2.3. Discrete mutual information <br>
    &emsp;&ensp; 2.3.1. *[Definition](/D/mi)* <br>
@@ -195,9 +197,19 @@ title: "Table of Contents"
    &emsp;&ensp; 5.1.9. *[Marginal likelihood](/D/ml)* <br>
    &emsp;&ensp; 5.1.10. **[Marginal likelihood is integral of joint likelihood](/P/ml-jl)** <br>
    
-   5.2. Bayesian inference <br>
-   &emsp;&ensp; 5.2.1. **[Bayes' theorem](/P/bayes-th)** <br>
-   &emsp;&ensp; 5.2.2. **[Bayes' rule](/P/bayes-rule)** <br>
+   5.2. Prior distributions <br>
+   &emsp;&ensp; 5.2.1. *[Flat vs. hard vs. soft](/D/prior-flat)* <br>
+   &emsp;&ensp; 5.2.2. *[Uniform vs. non-uniform](/D/prior-uni)* <br>
+   &emsp;&ensp; 5.2.3. *[Informative vs. non-informative](/D/prior-inf)* <br>
+   &emsp;&ensp; 5.2.4. *[Empirical vs. non-empirical](/D/prior-emp)* <br>
+   &emsp;&ensp; 5.2.5. *[Conjugate vs. non-conjugate](/D/prior-conj)* <br>
+   &emsp;&ensp; 5.2.6. *[Maximum entropy priors](/D/prior-maxent)* <br>
+   &emsp;&ensp; 5.2.7. *[Empirical Bayes priors](/D/prior-eb)* <br>
+   &emsp;&ensp; 5.2.8. *[Reference priors](/D/prior-ref)* <br>
+   
+   5.3. Bayesian inference <br>
+   &emsp;&ensp; 5.3.1. **[Bayes' theorem](/P/bayes-th)** <br>
+   &emsp;&ensp; 5.3.2. **[Bayes' rule](/P/bayes-rule)** <br>
 
 
 <br>
diff --git a/P/dent-add.md b/P/dent-add.md
diff --git a/P/dent-inv.md b/P/dent-inv.md