Skip to content

Commit 4aff6cd

Browse files
authored
Merge pull request #278 from JoramSoch/master
added 1 definition and 2 proofs
2 parents d4be0fb + 55a55ea commit 4aff6cd

4 files changed

Lines changed: 251 additions & 13 deletions

File tree

D/est.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
layout: definition
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2024-11-01 14:26:04
9+
10+
title: "Estimand, estimator and estimate"
11+
chapter: "General Theorems"
12+
section: "Estimation theory"
13+
topic: "Basic concepts of estimation"
14+
definition: "Estimator"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2024
19+
title: "Estimator"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2024-11-01"
22+
url: "https://en.wikipedia.org/wiki/Estimator#Definition"
23+
24+
def_id: "D208"
25+
shortcut: "est"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Definition:** Let $y \in \mathcal{Y}$ be [measured data](/D/data), governed by a [probability distribution](/D/dist) described by some [statistical parameter](/D/para) $\theta \in \Theta$. Then, a function $\hat{\theta}: \mathcal{Y} \rightarrow \Theta$ exemplifying a rule for calculating an estimate of $\theta$ from $y$ is called an "estimator". Estimation theory distinguishes:
31+
32+
* the quantify of interest $\theta$ is called the "estimand";
33+
34+
* the rule $\hat{\theta}$ for estimating it is called "estimator";
35+
36+
* the result of estimation $\hat{\theta}(y)$ is called "estimate".

I/ToC.md

Lines changed: 19 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -276,15 +276,19 @@ title: "Table of Contents"
276276

277277
3. <p id="Estimation theory">Estimation theory</p>
278278

279+
<p id="Basic concepts of estimation"></p>
280+
3.1. Basic concepts of estimation <br>
281+
&emsp;&ensp; 3.1.1. *[Estimator](/D/est)* <br>
282+
279283
<p id="Point estimates"></p>
280-
3.1. Point estimates <br>
281-
&emsp;&ensp; 3.1.1. *[Mean squared error](/D/mse)* <br>
282-
&emsp;&ensp; 3.1.2. **[Partition of the mean squared error into bias and variance](/P/mse-bnv)** <br>
284+
3.2. Point estimates <br>
285+
&emsp;&ensp; 3.2.1. *[Mean squared error](/D/mse)* <br>
286+
&emsp;&ensp; 3.2.2. **[Partition of the mean squared error into bias and variance](/P/mse-bnv)** <br>
283287

284288
<p id="Interval estimates"></p>
285-
3.2. Interval estimates <br>
286-
&emsp;&ensp; 3.2.1. *[Confidence interval](/D/ci)* <br>
287-
&emsp;&ensp; 3.2.2. **[Construction of confidence intervals using Wilks' theorem](/P/ci-wilks)** <br>
289+
3.3. Interval estimates <br>
290+
&emsp;&ensp; 3.3.1. *[Confidence interval](/D/ci)* <br>
291+
&emsp;&ensp; 3.3.2. **[Construction of confidence intervals using Wilks' theorem](/P/ci-wilks)** <br>
288292

289293
4. <p id="Frequentist statistics">Frequentist statistics</p>
290294

@@ -594,19 +598,21 @@ title: "Table of Contents"
594598
&emsp;&ensp; 4.1.6. **[Mean](/P/mvn-mean)** <br>
595599
&emsp;&ensp; 4.1.7. **[Covariance](/P/mvn-cov)** <br>
596600
&emsp;&ensp; 4.1.8. **[Differential entropy](/P/mvn-dent)** <br>
597-
&emsp;&ensp; 4.1.9. **[Kullback-Leibler divergence](/P/mvn-kl)** <br>
598-
&emsp;&ensp; 4.1.10. **[Linear transformation](/P/mvn-ltt)** <br>
599-
&emsp;&ensp; 4.1.11. **[Marginal distributions](/P/mvn-marg)** <br>
600-
&emsp;&ensp; 4.1.12. **[Conditional distributions](/P/mvn-cond)** <br>
601-
&emsp;&ensp; 4.1.13. **[Conditions for independence](/P/mvn-ind)** <br>
602-
&emsp;&ensp; 4.1.14. **[Independence of products](/P/mvn-indprod)** <br>
601+
&emsp;&ensp; 4.1.9. **[Mutual information](/P/mvn-mi)** <br>
602+
&emsp;&ensp; 4.1.10. **[Kullback-Leibler divergence](/P/mvn-kl)** <br>
603+
&emsp;&ensp; 4.1.11. **[Linear transformation](/P/mvn-ltt)** <br>
604+
&emsp;&ensp; 4.1.12. **[Marginal distributions](/P/mvn-marg)** <br>
605+
&emsp;&ensp; 4.1.13. **[Conditional distributions](/P/mvn-cond)** <br>
606+
&emsp;&ensp; 4.1.14. **[Conditions for independence](/P/mvn-ind)** <br>
607+
&emsp;&ensp; 4.1.15. **[Independence of products](/P/mvn-indprod)** <br>
603608

604609
<p id="Bivariate normal distribution"></p>
605610
4.2. Bivariate normal distribution <br>
606611
&emsp;&ensp; 4.2.1. *[Definition](/D/bvn)* <br>
607612
&emsp;&ensp; 4.2.2. **[Probability density function](/P/bvn-pdf)** <br>
608613
&emsp;&ensp; 4.2.3. **[Probability density function in terms of correlation coefficient](/P/bvn-pdfcorr)** <br>
609-
&emsp;&ensp; 4.2.4. **[Linear combination](/P/bvn-lincomb)** <br>
614+
&emsp;&ensp; 4.2.4. **[Mutual information](/P/bvn-mi)** <br>
615+
&emsp;&ensp; 4.2.5. **[Linear combination](/P/bvn-lincomb)** <br>
610616

611617
<p id="Multivariate t-distribution"></p>
612618
4.3. Multivariate t-distribution <br>

P/bvn-mi.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2024-11-01 11:51:06
9+
10+
title: "Mutual information of the bivariate normal distribution"
11+
chapter: "Probability Distributions"
12+
section: "Multivariate continuous distributions"
13+
topic: "Bivariate normal distribution"
14+
theorem: "Mutual information"
15+
16+
sources:
17+
- authors: "Krafft, Peter"
18+
year: 2013
19+
title: "Correlation and Mutual Information"
20+
in: "Princeton University Department of Computer Science: Laboratory for Intelligent Probabilistic Systems"
21+
pages: "February 13, 2013"
22+
url: "https://lips.cs.princeton.edu/correlation-and-mutual-information/"
23+
24+
proof_id: "P476"
25+
shortcut: "bvn-mi"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Theorem:** Let $X$ and $Y$ follow a [bivariate normal distribution](/D/bvn):
31+
32+
$$ \label{eq:bvn}
33+
\left[ \begin{matrix} X \\ Y \end{matrix} \right] \sim
34+
\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{matrix} \right] \right) \; .
35+
$$
36+
37+
Then, the [mutual information](/D/mi) of $X$ and $Y$ is
38+
39+
$$ \label{eq:bvn-lincomb}
40+
\mathrm{I}(X,Y) = -\frac{1}{2} \ln (1-\rho^2)
41+
$$
42+
43+
where $\rho$ is the [correlation](/D/corr) of $X$ and $Y$.
44+
45+
46+
**Proof:** [Mutual information can be written in terms of marginal and joint differential entropy](/P/cmi-mjde):
47+
48+
$$ \label{eq:cmi-mjde}
49+
\mathrm{I}(X,Y) = \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \; .
50+
$$
51+
52+
The [marginal distributions of the multivariate normal distribution are also multivariate normal]
53+
54+
$$ \label{eq:mvn-marg}
55+
\left[ \begin{matrix} X_1 \\ X_2 \end{matrix} \right] \sim
56+
\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{matrix} \right] \right)
57+
\quad \Rightarrow \quad
58+
X_1 \sim \mathcal{N}\left( \mu_1, \Sigma_{11} \right) \; ,
59+
$$
60+
61+
such that the [marginals](/D/marg) of the [bivariate normal distribution](/D/bvn) are [univariate normal distribution](/D/norm):
62+
63+
$$ \label{eq:bvn-marg}
64+
\left[ \begin{matrix} X \\ Y \end{matrix} \right] \sim
65+
\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2 \end{matrix} \right] \right)
66+
\quad \Rightarrow \quad
67+
X \sim \mathcal{N}\left( \mu_1, \sigma_1^2 \right)
68+
\quad \text{and} \quad
69+
Y \sim \mathcal{N}\left( \mu_2, \sigma_2^2 \right) \; .
70+
$$
71+
72+
The [differential entropy of the univariate normal distribution](/P/norm-dent) is
73+
74+
$$ \label{eq:norm-dent}
75+
\mathrm{h}(X) = \frac{1}{2} \ln\left( 2 \pi \sigma^2 e \right)
76+
$$
77+
78+
and the [differential entropy of the multivariate normal distribution](/P/mvn-dent) is
79+
80+
$$ \label{eq:mvn-dent}
81+
\mathrm{h}(x) = \frac{n}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma| + \frac{1}{2} n
82+
$$
83+
84+
where $\lvert \Sigma \rvert$ is the determinant of the [covariance matrix](/D/covmat) $\Sigma$. A two-dimensional [covariance matrix can be rewritten in terms of correlations](/P/covmat-corrmat) as follows:
85+
86+
$$ \label{eq:Sigma}
87+
\begin{split}
88+
\Sigma
89+
&= \left[ \begin{matrix} \sigma_1 & 0 \\ 0 & \sigma_2 \end{matrix} \right] \left[ \begin{matrix} 1 & \rho \\ \rho & 1 \end{matrix} \right] \left[ \begin{matrix} \sigma_1 & 0 \\ 0 & \sigma_2 \end{matrix} \right] \\
90+
&= \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \; .
91+
\end{split}
92+
$$
93+
94+
Combining \eqref{eq:cmi-mjde} with \eqref{eq:norm-dent} and \eqref{eq:mvn-dent}, applying $n = 2$, we get:
95+
96+
$$ \label{eq:bvn-mi}
97+
\begin{split}
98+
\mathrm{I}(X,Y)
99+
&\overset{\eqref{eq:cmi-mjde}}{=} \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \\
100+
&\overset{\eqref{eq:bvn-marg}}{=} \mathrm{h}\left[ \mathcal{N}\left( \mu_1, \sigma_1^2 \right) \right] + \mathrm{h}\left[ \mathcal{N}\left( \mu_2, \sigma_2^2 \right) \right] - \mathrm{h}\left[ \mathcal{N}\left( \mu, \Sigma \right) \right] \\
101+
&\overset{\eqref{eq:Sigma}}{=} \left[ \frac{1}{2} \ln\left( 2 \pi \sigma_1^2 e \right) \right] + \left[ \frac{1}{2} \ln\left( 2 \pi \sigma_2^2 e \right) \right] - \left[ \frac{2}{2} \ln(2\pi) + \frac{1}{2} \ln \left| \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \right| + \frac{1}{2} \cdot 2 \right] \\
102+
&= \left( \frac{2}{2} \ln(2\pi) + \frac{2}{2} \ln(e) - \ln(2\pi) - 1 \right) + \left( \frac{1}{2} \ln\left( \sigma_1^2 \right) + \frac{1}{2} \ln\left( \sigma_2^2 \right) - \frac{1}{2} \ln \left| \left[ \begin{matrix} \sigma_1^2 & \rho \, \sigma_1 \sigma_2 \\ \rho \, \sigma_1 \sigma_2 & \sigma_2^2 \end{matrix} \right] \right| \right) \\
103+
&= \frac{1}{2} \left[ \ln\left( \sigma_1^2 \right) + \ln\left( \sigma_2^2 \right) - \ln\left( \sigma_1^2 \sigma_2^2 - (\rho \, \sigma_1 \sigma_2)^2 \right) \right] \\
104+
&= \frac{1}{2} \ln \left[ \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 \sigma_2^2 - (\rho \, \sigma_1 \sigma_2)^2} \right] \\
105+
&= \frac{1}{2} \ln \left[ \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 \sigma_2^2 (1-\rho^2)} \right] \\
106+
&= \frac{1}{2} \ln \left[ \frac{1}{1-\rho^2} \right] \\
107+
&= -\frac{1}{2} \ln (1-\rho^2) \; .
108+
\end{split}
109+
$$

P/mvn-mi.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2024-11-01 12:36:44
9+
10+
title: "Mutual information of the multivariate normal distribution"
11+
chapter: "Probability Distributions"
12+
section: "Multivariate continuous distributions"
13+
topic: "Multivariate normal distribution"
14+
theorem: "Mutual information"
15+
16+
sources:
17+
- authors: "a06e"
18+
year: 2019
19+
title: "Mutual information between subsets of variables in the multivariate normal distribution"
20+
in: "StackExchange CrossValidated"
21+
pages: "retrieved on 2024-11-01"
22+
url: "https://stats.stackexchange.com/a/438613/270304"
23+
24+
proof_id: "P477"
25+
shortcut: "mvn-mi"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Theorem:** Let $X \in \mathbb{R}^n$ and $Y \in \mathbb{R}^m$ be [random vectors](/D/rvec) that are [jointly multivariate normal](/D/mvn):
31+
32+
$$ \label{eq:bvn}
33+
\left[ \begin{matrix} X \\ Y \end{matrix} \right] \sim
34+
\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \Sigma_1 & \Sigma_{12} \\ \Sigma_{21} & \Sigma_2 \end{matrix} \right] \right) \; .
35+
$$
36+
37+
Then, the [mutual information](/D/mi) of $X$ and $Y$ is
38+
39+
$$ \label{eq:bvn-lincomb}
40+
\mathrm{I}(X,Y) = \frac{1}{2} \ln \left[ \frac{|\Sigma_1| |\Sigma_2|}{|\Sigma|} \right]
41+
$$
42+
43+
where $\mu \in \mathbb{R}^p$ and $\Sigma \in \mathbb{R}^{p \times p}$ are the [mean](/D/mean) and [covariance matrix](/D/covmat) of the [random vector](/D/rvec) $\left[ \begin{matrix} X \\\\ Y \end{matrix} \right] \in \mathbb{R}^p$, respectively, where $p = n + m$.
44+
45+
46+
**Proof:** [Mutual information can be written in terms of marginal and joint differential entropy](/P/cmi-mjde):
47+
48+
$$ \label{eq:cmi-mjde}
49+
\mathrm{I}(X,Y) = \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \; .
50+
$$
51+
52+
The [marginal distributions of the multivariate normal distribution are also multivariate normal]
53+
54+
$$ \label{eq:mvn-marg}
55+
\left[ \begin{matrix} X_1 \\ X_2 \end{matrix} \right] \sim
56+
\mathcal{N}\left( \left[ \begin{matrix} \mu_1 \\ \mu_2 \end{matrix} \right], \left[ \begin{matrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{matrix} \right] \right)
57+
\quad \Rightarrow \quad
58+
X_1 \sim \mathcal{N}\left( \mu_1, \Sigma_{11} \right) \; ,
59+
$$
60+
61+
such that the [marginals](/D/marg) of $X$ and $Y$ are:
62+
63+
$$ \label{eq:X-Y-marg}
64+
X \sim \mathcal{N}\left( \mu_1, \Sigma_1 \right)
65+
\quad \text{and} \quad
66+
Y \sim \mathcal{N}\left( \mu_2, \Sigma_2 \right) \; .
67+
$$
68+
69+
The [differential entropy of the multivariate normal distribution](/P/mvn-dent) is
70+
71+
$$ \label{eq:mvn-dent}
72+
\mathrm{h}(x) = \frac{n}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma| + \frac{1}{2} n
73+
$$
74+
75+
where $\lvert \Sigma \rvert$ is the determinant of $\Sigma$. Combining \eqref{eq:cmi-mjde} with \eqref{eq:mvn-dent}, we get:
76+
77+
$$ \label{eq:bvn-mi}
78+
\begin{split}
79+
\mathrm{I}(X,Y)
80+
&\overset{\eqref{eq:cmi-mjde}}{=} \mathrm{h}(X) + \mathrm{h}(Y) - \mathrm{h}(X,Y) \\
81+
&\overset{\eqref{eq:X-Y-marg}}{=} \mathrm{h}\left[ \mathcal{N}\left( \mu_1, \Sigma_1 \right) \right] + \mathrm{h}\left[ \mathcal{N}\left( \mu_2, \Sigma_2 \right) \right] - \mathrm{h}\left[ \mathcal{N}\left( \mu, \Sigma \right) \right] \\
82+
&\overset{\eqref{eq:mvn-dent}}{=} \left[ \frac{n}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma_1| + \frac{1}{2} n \right] + \left[ \frac{m}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma_2| + \frac{1}{2} m \right] - \left[ \frac{p}{2} \ln(2\pi) + \frac{1}{2} \ln|\Sigma| + \frac{1}{2} p \right] \\
83+
&= \left( \frac{n+m-p}{2} \ln(2\pi) + \frac{1}{2}(n+m-p) \right) + \left( \frac{1}{2} \ln|\Sigma_1| + \frac{1}{2} \ln|\Sigma_2| - \frac{1}{2} \ln|\Sigma| \right) \\
84+
&= \frac{1}{2} \left( \ln|\Sigma_1| + \ln|\Sigma_2| - \ln|\Sigma| \right) \\
85+
&= \frac{1}{2} \ln \left[ \frac{|\Sigma_1| |\Sigma_2|}{|\Sigma|} \right] \; .
86+
\end{split}
87+
$$

0 commit comments

Comments
 (0)