Merge pull request #278 from JoramSoch/master

JoramSoch · web-flow · commit 4aff6cd3c355 · 2024-11-01T14:42:21.000+01:00
added 1 definition and 2 proofs
diff --git a/D/est.md b/D/est.md
@@ -0,0 +1,36 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2024-11-01 14:26:04
+
+title: "Estimand, estimator and estimate"
+chapter: "General Theorems"
+section: "Estimation theory"
+topic: "Basic concepts of estimation"
+definition: "Estimator"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2024
+    title: "Estimator"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2024-11-01"
+    url: "https://en.wikipedia.org/wiki/Estimator#Definition"
+
+def_id: "D208"
+shortcut: "est"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $y \in \mathcal{Y}$ be [measured data](/D/data), governed by a [probability distribution](/D/dist) described by some [statistical parameter](/D/para) $\theta \in \Theta$. Then, a function $\hat{\theta}: \mathcal{Y} \rightarrow \Theta$ exemplifying a rule for calculating an estimate of $\theta$ from $y$ is called an "estimator". Estimation theory distinguishes:
+
+* the quantify of interest $\theta$ is called the "estimand";
+
+* the rule $\hat{\theta}$ for estimating it is called "estimator";
+
+* the result of estimation $\hat{\theta}(y)$ is called "estimate".
diff --git a/I/ToC.md b/I/ToC.md
@@ -276,15 +276,19 @@ title: "Table of Contents"
 
 3. <p id="Estimation theory">Estimation theory</p>
    
+   <p id="Basic concepts of estimation"></p>
+   3.1. Basic concepts of estimation <br>
+   &emsp;&ensp; 3.1.1. *[Estimator](/D/est)* <br>
+   
    <p id="Point estimates"></p>
-   3.1. Point estimates <br>
-   &emsp;&ensp; 3.1.1. *[Mean squared error](/D/mse)* <br>
-   &emsp;&ensp; 3.1.2. **[Partition of the mean squared error into bias and variance](/P/mse-bnv)** <br>
+   3.2. Point estimates <br>
+   &emsp;&ensp; 3.2.1. *[Mean squared error](/D/mse)* <br>
+   &emsp;&ensp; 3.2.2. **[Partition of the mean squared error into bias and variance](/P/mse-bnv)** <br>
    
    <p id="Interval estimates"></p>
-   3.2. Interval estimates <br>
-   &emsp;&ensp; 3.2.1. *[Confidence interval](/D/ci)* <br>
-   &emsp;&ensp; 3.2.2. **[Construction of confidence intervals using Wilks' theorem](/P/ci-wilks)** <br>
+   3.3. Interval estimates <br>
+   &emsp;&ensp; 3.3.1. *[Confidence interval](/D/ci)* <br>
+   &emsp;&ensp; 3.3.2. **[Construction of confidence intervals using Wilks' theorem](/P/ci-wilks)** <br>
 
 4. <p id="Frequentist statistics">Frequentist statistics</p>
    
@@ -594,19 +598,21 @@ title: "Table of Contents"
    &emsp;&ensp; 4.1.6. **[Mean](/P/mvn-mean)** <br>
    &emsp;&ensp; 4.1.7. **[Covariance](/P/mvn-cov)** <br>
    &emsp;&ensp; 4.1.8. **[Differential entropy](/P/mvn-dent)** <br>
-   &emsp;&ensp; 4.1.9. **[Kullback-Leibler divergence](/P/mvn-kl)** <br>
-   &emsp;&ensp; 4.1.10. **[Linear transformation](/P/mvn-ltt)** <br>
-   &emsp;&ensp; 4.1.11. **[Marginal distributions](/P/mvn-marg)** <br>
-   &emsp;&ensp; 4.1.12. **[Conditional distributions](/P/mvn-cond)** <br>
-   &emsp;&ensp; 4.1.13. **[Conditions for independence](/P/mvn-ind)** <br>
-   &emsp;&ensp; 4.1.14. **[Independence of products](/P/mvn-indprod)** <br>
+   &emsp;&ensp; 4.1.9. **[Mutual information](/P/mvn-mi)** <br>
+   &emsp;&ensp; 4.1.10. **[Kullback-Leibler divergence](/P/mvn-kl)** <br>
+   &emsp;&ensp; 4.1.11. **[Linear transformation](/P/mvn-ltt)** <br>
+   &emsp;&ensp; 4.1.12. **[Marginal distributions](/P/mvn-marg)** <br>
+   &emsp;&ensp; 4.1.13. **[Conditional distributions](/P/mvn-cond)** <br>
+   &emsp;&ensp; 4.1.14. **[Conditions for independence](/P/mvn-ind)** <br>
+   &emsp;&ensp; 4.1.15. **[Independence of products](/P/mvn-indprod)** <br>
    
    <p id="Bivariate normal distribution"></p>
    4.2. Bivariate normal distribution <br>
    &emsp;&ensp; 4.2.1. *[Definition](/D/bvn)* <br>
    &emsp;&ensp; 4.2.2. **[Probability density function](/P/bvn-pdf)** <br>
    &emsp;&ensp; 4.2.3. **[Probability density function in terms of correlation coefficient](/P/bvn-pdfcorr)** <br>
-   &emsp;&ensp; 4.2.4. **[Linear combination](/P/bvn-lincomb)** <br>
+   &emsp;&ensp; 4.2.4. **[Mutual information](/P/bvn-mi)** <br>
+   &emsp;&ensp; 4.2.5. **[Linear combination](/P/bvn-lincomb)** <br>
    
    <p id="Multivariate t-distribution"></p>
    4.3. Multivariate t-distribution <br>
diff --git a/P/bvn-mi.md b/P/bvn-mi.md
@@ -0,0 +1,109 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2024-11-01 11:51:06
+
+title: "Mutual information of the bivariate normal distribution"
+chapter: "Probability Distributions"
+section: "Multivariate continuous distributions"
+topic: "Bivariate normal distribution"
+theorem: "Mutual information"
+
+sources:
+  - authors: "Krafft, Peter"
+    year: 2013
+    title: "Correlation and Mutual Information"
+    in: "Princeton University Department of Computer Science: Laboratory for Intelligent Probabilistic Systems"
+    pages: "February 13, 2013"
+    url: "https://lips.cs.princeton.edu/correlation-and-mutual-information/"
+
+proof_id: "P476"
+shortcut: "bvn-mi"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X$ and $Y$ follow a [bivariate normal distribution](/D/bvn):
+
+$$ \label{eq:bvn}
+\left[ \begin{matrix} X \\ Y \end{matrix} \right] \sim
+\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{matrix} \right] \right) \; .
+$$
+
+Then, the [mutual information](/D/mi) of $X$ and $Y$ is
+
+$$ \label{eq:bvn-lincomb}
+\mathrm{I}(X,Y) = -\frac{1}{2} \ln (1-\rho^2)
+$$
+
+where $\rho$ is the [correlation](/D/corr) of $X$ and $Y$.
+
+
+**Proof:** [Mutual information can be written in terms of marginal and joint differential entropy](/P/cmi-mjde):
+
+$$ \label{eq:cmi-mjde}
+\mathrm{I}(X,Y) = \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \; .
+$$
+
+The [marginal distributions of the multivariate normal distribution are also multivariate normal]
+
+$$ \label{eq:mvn-marg}
+\left[ \begin{matrix} X_1 \\ X_2 \end{matrix} \right] \sim
+\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{matrix} \right] \right)
+\quad \Rightarrow \quad
+X_1 \sim \mathcal{N}\left( \mu_1, \Sigma_{11} \right) \; ,
+$$
+
+such that the [marginals](/D/marg) of the [bivariate normal distribution](/D/bvn) are [univariate normal distribution](/D/norm):
+
+$$ \label{eq:bvn-marg}
+\left[ \begin{matrix} X \\ Y \end{matrix} \right] \sim
+\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{matrix} \right] \right)
+\quad \Rightarrow \quad
+X \sim \mathcal{N}\left( \mu_1, \sigma_1^2 \right)
+\quad \text{and} \quad
+Y \sim \mathcal{N}\left( \mu_2, \sigma_2^2 \right) \; .
+$$
+
+The [differential entropy of the univariate normal distribution](/P/norm-dent) is
+
+$$ \label{eq:norm-dent}
+\mathrm{h}(X) = \frac{1}{2} \ln\left( 2 \pi \sigma^2 e \right)
+$$
+
+and the [differential entropy of the multivariate normal distribution](/P/mvn-dent) is
+
+$$ \label{eq:mvn-dent}
+\mathrm{h}(x) = \frac{n}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma| + \frac{1}{2} n
+$$
+
+where $\lvert \Sigma \rvert$ is the determinant of the [covariance matrix](/D/covmat) $\Sigma$. A two-dimensional [covariance matrix can be rewritten in terms of correlations](/P/covmat-corrmat) as follows:
+
+$$ \label{eq:Sigma}
+\begin{split}
+\Sigma
+&= \left[ \begin{matrix} \sigma_1 & 0 \\ 0 & \sigma_2 \end{matrix} \right] \left[ \begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix} \right] \left[ \begin{matrix} \sigma_1 & 0 \\ 0 & \sigma_2 \end{matrix} \right] \\
+&= \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \; .
+\end{split}
+$$
+
+Combining \eqref{eq:cmi-mjde} with \eqref{eq:norm-dent} and \eqref{eq:mvn-dent}, applying $n = 2$, we get:
+
+$$ \label{eq:bvn-mi}
+\begin{split}
+\mathrm{I}(X,Y)
+&\overset{\eqref{eq:cmi-mjde}}{=} \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \\
+&\overset{\eqref{eq:bvn-marg}}{=} \mathrm{h}\left[ \mathcal{N}\left( \mu_1, \sigma_1^2 \right) \right] + \mathrm{h}\left[ \mathcal{N}\left( \mu_2, \sigma_2^2 \right) \right] - \mathrm{h}\left[ \mathcal{N}\left( \mu, \Sigma \right) \right] \\
+&\overset{\eqref{eq:Sigma}}{=} \left[ \frac{1}{2} \ln\left( 2 \pi \sigma_1^2 e \right) \right] + \left[ \frac{1}{2} \ln\left( 2 \pi \sigma_2^2 e \right) \right] - \left[ \frac{2}{2} \ln(2\pi) + \frac{1}{2} \ln \left| \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \right| + \frac{1}{2} \cdot 2 \right] \\
+&= \left( \frac{2}{2} \ln(2\pi) + \frac{2}{2} \ln(e) - \ln(2\pi) - 1 \right) + \left( \frac{1}{2} \ln\left( \sigma_1^2 \right) + \frac{1}{2} \ln\left( \sigma_2^2 \right) - \frac{1}{2} \ln \left| \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \right| \right) \\
+&= \frac{1}{2} \left[ \ln\left( \sigma_1^2 \right) + \ln\left( \sigma_2^2 \right) - \ln\left( \sigma_1^2 \sigma_2^2 - (\rho \, \sigma_1 \sigma_2)^2 \right) \right] \\
+&= \frac{1}{2} \ln \left[ \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 \sigma_2^2 - (\rho \, \sigma_1 \sigma_2)^2} \right] \\
+&= \frac{1}{2} \ln \left[ \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 \sigma_2^2 (1-\rho^2)} \right] \\
+&= \frac{1}{2} \ln \left[ \frac{1}{1-\rho^2} \right] \\
+&= -\frac{1}{2} \ln (1-\rho^2) \; .
+\end{split}
+$$
diff --git a/P/mvn-mi.md b/P/mvn-mi.md
@@ -0,0 +1,87 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2024-11-01 12:36:44
+
+title: "Mutual information of the multivariate normal distribution"
+chapter: "Probability Distributions"
+section: "Multivariate continuous distributions"
+topic: "Multivariate normal distribution"
+theorem: "Mutual information"
+
+sources:
+  - authors: "a06e"
+    year: 2019
+    title: "Mutual information between subsets of variables in the multivariate normal distribution"
+    in: "StackExchange CrossValidated"
+    pages: "retrieved on 2024-11-01"
+    url: "https://stats.stackexchange.com/a/438613/270304"
+
+proof_id: "P477"
+shortcut: "mvn-mi"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Let $X \in \mathbb{R}^n$ and $Y \in \mathbb{R}^m$ be [random vectors](/D/rvec) that are [jointly multivariate normal](/D/mvn):
+
+$$ \label{eq:bvn}
+\left[ \begin{matrix} X \\ Y \end{matrix} \right] \sim
+\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \Sigma_1 & \Sigma_{12} \\ \Sigma_{21} & \Sigma_2 \end{matrix} \right] \right) \; .
+$$
+
+Then, the [mutual information](/D/mi) of $X$ and $Y$ is
+
+$$ \label{eq:bvn-lincomb}
+\mathrm{I}(X,Y) = \frac{1}{2} \ln \left[ \frac{|\Sigma_1| |\Sigma_2|}{|\Sigma|} \right]
+$$
+
+where $\mu \in \mathbb{R}^p$ and $\Sigma \in \mathbb{R}^{p \times p}$ are the [mean](/D/mean) and [covariance matrix](/D/covmat) of the [random vector](/D/rvec) $\left[ \begin{matrix} X \\\\ Y \end{matrix} \right] \in \mathbb{R}^p$, respectively, where $p = n + m$.
+
+
+**Proof:** [Mutual information can be written in terms of marginal and joint differential entropy](/P/cmi-mjde):
+
+$$ \label{eq:cmi-mjde}
+\mathrm{I}(X,Y) = \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \; .
+$$
+
+The [marginal distributions of the multivariate normal distribution are also multivariate normal]
+
+$$ \label{eq:mvn-marg}
+\left[ \begin{matrix} X_1 \\ X_2 \end{matrix} \right] \sim
+\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{matrix} \right] \right)
+\quad \Rightarrow \quad
+X_1 \sim \mathcal{N}\left( \mu_1, \Sigma_{11} \right) \; ,
+$$
+
+such that the [marginals](/D/marg) of $X$ and $Y$ are:
+
+$$ \label{eq:X-Y-marg}
+X \sim \mathcal{N}\left( \mu_1, \Sigma_1 \right)
+\quad \text{and} \quad
+Y \sim \mathcal{N}\left( \mu_2, \Sigma_2 \right) \; .
+$$
+
+The [differential entropy of the multivariate normal distribution](/P/mvn-dent) is
+
+$$ \label{eq:mvn-dent}
+\mathrm{h}(x) = \frac{n}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma| + \frac{1}{2} n
+$$
+
+where $\lvert \Sigma \rvert$ is the determinant of $\Sigma$. Combining \eqref{eq:cmi-mjde} with \eqref{eq:mvn-dent}, we get:
+
+$$ \label{eq:bvn-mi}
+\begin{split}
+\mathrm{I}(X,Y)
+&\overset{\eqref{eq:cmi-mjde}}{=} \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \\
+&\overset{\eqref{eq:X-Y-marg}}{=} \mathrm{h}\left[ \mathcal{N}\left( \mu_1, \Sigma_1 \right) \right] + \mathrm{h}\left[ \mathcal{N}\left( \mu_2, \Sigma_2 \right) \right] - \mathrm{h}\left[ \mathcal{N}\left( \mu, \Sigma \right) \right] \\
+&\overset{\eqref{eq:mvn-dent}}{=} \left[ \frac{n}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma_1| + \frac{1}{2} n \right] + \left[ \frac{m}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma_2| + \frac{1}{2} m \right] - \left[ \frac{p}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma| + \frac{1}{2} p \right] \\
+&= \left( \frac{n+m-p}{2} \ln(2\pi) + \frac{1}{2}(n+m-p) \right) + \left( \frac{1}{2} \ln|\Sigma_1| + \frac{1}{2} \ln|\Sigma_2| - \frac{1}{2} \ln|\Sigma| \right) \\
+&= \frac{1}{2} \left( \ln|\Sigma_1| + \ln|\Sigma_2| - \ln|\Sigma| \right) \\
+&= \frac{1}{2} \ln \left[ \frac{|\Sigma_1| |\Sigma_2|}{|\Sigma|} \right] \; .
+\end{split}
+$$