Skip to content

Commit 99b32cb

Browse files
authored
Merge pull request #142 from JoramSoch/master
added 2 definitions and 2 proofs
2 parents a53db14 + 86c808a commit 99b32cb

5 files changed

Lines changed: 225 additions & 1 deletion

File tree

D/corr-samp.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
layout: definition
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-12-14 07:23:00
9+
10+
title: "Sample correlation coefficient"
11+
chapter: "General Theorems"
12+
section: "Probability theory"
13+
topic: "Correlation"
14+
definition: "Sample correlation coefficient"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2021
19+
title: "Pearson correlation coefficient"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2021-12-14"
22+
url: "https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample"
23+
24+
def_id: "D168"
25+
shortcut: "corr-samp"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Definition:** Let $x = \left\lbrace x_1, \ldots, x_n \right\rbrace$ and $y = \left\lbrace y_1, \ldots, y_n \right\rbrace$ be [samples](/D/samp) from [random variables](/D/rvar) $X$ and $Y$. Then, the sample correlation coefficient of $x$ and $y$ is given by
31+
32+
$$ \label{eq:corr-samp}
33+
r_{xy} = \frac{\sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y})}{\sqrt{\sum_{i=1}^n (x_i-\bar{x})^2} \sqrt{\sum_{i=1}^n (y_i-\bar{y})^2}}
34+
$$
35+
36+
where $\bar{x}$ and $\bar{y}$ are the [sample means](/D/mean-samp).

D/corrmat-samp.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
layout: definition
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-12-14 07:45:00
9+
10+
title: "Sample correlation matrix"
11+
chapter: "General Theorems"
12+
section: "Probability theory"
13+
topic: "Correlation"
14+
definition: "Sample correlation matrix"
15+
16+
sources:
17+
18+
def_id: "D169"
19+
shortcut: "corrmat-samp"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Definition:** Let $x = \left\lbrace x_1, \ldots, x_n \right\rbrace$ be a [sample](/D/samp) from a [random vector](/D/rvec) $X \in \mathbb{R}^{p \times 1}$. Then, the sample correlation matrix of $x$ is the matrix whose entries are the [sample correlation coefficients](/D/corr-samp) between pairs of entries of $x_1, \ldots, x_n$:
25+
26+
$$ \label{eq:corrmat-samp-v1}
27+
\mathrm{R}_{xx} =
28+
\begin{bmatrix}
29+
r_{x^{(1)},x^{(1)}} & \ldots & r_{x^{(1)},x^{(n)}} \\
30+
\vdots & \ddots & \vdots \\
31+
r_{x^{(n)},x^{(1)}} & \ldots & r_{x^{(n)},x^{(n)}}
32+
\end{bmatrix}
33+
$$
34+
35+
where the $r_{x^{(j)},x^{(k)}}$ is the [sample correlation](/D/corr-samp) between the $j$-th and the $k$-th entry of $X$ given by
36+
37+
$$ \label{eq:corrmat-samp-v2}
38+
r_{x^{(j)},x^{(k)}} = \frac{\sum_{i=1}^n (x_{ij}-\bar{x}^{(j)}) (x_{ik}-\bar{x}^{(k)})}{\sqrt{\sum_{i=1}^n (x_{ij}-\bar{x}^{(j)})^2} \sqrt{\sum_{i=1}^n (x_{ik}-\bar{x}^{(k)})^2}}
39+
$$
40+
41+
in which $\bar{x}^{(j)}$ and $\bar{x}^{(k)}$ are the [sample means](/D/mean-samp)
42+
43+
$$ \label{eq:mean-samp}
44+
\begin{split}
45+
\bar{x}^{(j)} &= \frac{1}{n} \sum_{i=1}^n x_{ij} \\
46+
\bar{x}^{(k)} &= \frac{1}{n} \sum_{i=1}^n x_{ik} \; .
47+
\end{split}
48+
$$

I/ToC.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,11 @@ title: "Table of Contents"
139139

140140
1.10. Correlation <br>
141141
&emsp;&ensp; 1.10.1. *[Definition](/D/corr)* <br>
142-
&emsp;&ensp; 1.10.2. *[Correlation matrix](/D/corrmat)* <br>
142+
&emsp;&ensp; 1.10.2. **[Range](/P/corr-range)** <br>
143+
&emsp;&ensp; 1.10.3. *[Sample correlation coefficient](/D/corr-samp)* <br>
144+
&emsp;&ensp; 1.10.4. **[Relationship to standard scores](/P/corr-z)** <br>
145+
&emsp;&ensp; 1.10.5. *[Correlation matrix](/D/corrmat)* <br>
146+
&emsp;&ensp; 1.10.6. *[Sample correlation matrix](/D/corrmat-samp)* <br>
143147

144148
1.11. Measures of central tendency <br>
145149
&emsp;&ensp; 1.11.1. *[Median](/D/med)* <br>

P/corr-range.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-12-14 02:08:00
9+
10+
title: "Correlation always falls between -1 and +1"
11+
chapter: "General Theorems"
12+
section: "Probability theory"
13+
topic: "Correlation"
14+
theorem: "Range"
15+
16+
sources:
17+
- authors: "Dor Leventer"
18+
year: 2021
19+
title: "How can I simply prove that the pearson correlation coefficient is between -1 and 1?"
20+
in: "StackExchange Mathematics"
21+
pages: "retrieved on 2021-12-14"
22+
url: "https://math.stackexchange.com/a/4260655/480910"
23+
24+
proof_id: "P300"
25+
shortcut: "corr-range"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Theorem:** Let $X$ and $Y$ be two [random variables](/D/rvar). Then, the correlation of $X$ and $Y$ is between and including $-1$ and $+1$:
31+
32+
$$ \label{eq:corr-range}
33+
-1 \leq \mathrm{Corr}(X,Y) \leq +1 \; .
34+
$$
35+
36+
37+
**Proof:** Consider the [variance](/D/var) of $X$ plus or minus $Y$, divided by their [standard deviations](/D/std):
38+
39+
$$ \label{eq:var-XY}
40+
\mathrm{Var}\left( \frac{X}{\sigma_X} \pm \frac{Y}{\sigma_Y} \right) \; .
41+
$$
42+
43+
Because the [variance is non-negative](/P/var-nonneg), this term is larger than or equal to zero:
44+
45+
$$ \label{eq:var-XY-0}
46+
0 \leq \mathrm{Var}\left( \frac{X}{\sigma_X} \pm \frac{Y}{\sigma_Y} \right) \; .
47+
$$
48+
49+
Using the [variance of a linear combination](/P/var-lincomb), it can also be written as:
50+
51+
$$ \label{eq:var-XY-s1}
52+
\begin{split}
53+
\mathrm{Var}\left( \frac{X}{\sigma_X} \pm \frac{Y}{\sigma_Y} \right) &= \mathrm{Var}\left( \frac{X}{\sigma_X} \right) + \mathrm{Var}\left( \frac{Y}{\sigma_Y} \right) \pm 2 \, \mathrm{Cov}\left( \frac{X}{\sigma_X}, \frac{Y}{\sigma_Y} \right) \\
54+
&= \frac{1}{\sigma_X^2} \mathrm{Var}(X) + \frac{1}{\sigma_Y^2} \mathrm{Var}(Y) \pm 2 \, \frac{1}{\sigma_X \sigma_Y} \, \mathrm{Cov}(X,Y) \\
55+
&= \frac{1}{\sigma_X^2} \sigma_X^2 + \frac{1}{\sigma_Y^2} \sigma_Y^2 \pm 2 \, \frac{1}{\sigma_X \sigma_Y} \, \sigma_{XY} \; .
56+
\end{split}
57+
$$
58+
59+
Using the [relationship between covariance and correlation](/P/cov-corr), we have:
60+
61+
$$ \label{eq:var-XY-s2}
62+
\mathrm{Var}\left( \frac{X}{\sigma_X} \pm \frac{Y}{\sigma_Y} \right) = 1 + 1 + \pm 2 \, \mathrm{Corr}(X,Y) \; .
63+
$$
64+
65+
Thus, the combination of \eqref{eq:var-XY-0} with \eqref{eq:var-XY-s2} yields
66+
67+
$$ \label{eq:var-XY-ineq}
68+
0 \leq 2 \pm 2 \, \mathrm{Corr}(X,Y)
69+
$$
70+
71+
which is equivalent to
72+
73+
$$ \label{eq:corr-range-qed}
74+
-1 \leq \mathrm{Corr}(X,Y) \leq +1 \; .
75+
$$

P/corr-z.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-12-14 02:31:00
9+
10+
title: "Correlation coefficient in terms of standard scores"
11+
chapter: "General Theorems"
12+
section: "Probability theory"
13+
topic: "Correlation"
14+
theorem: "Relationship to standard scores"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2021
19+
title: "Peason correlation coefficient"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2021-12-14"
22+
url: "https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample"
23+
24+
proof_id: "P299"
25+
shortcut: "corr-z"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Theorem:** Let $x = \left\lbrace x_1, \ldots, x_n \right\rbrace$ and $y = \left\lbrace y_1, \ldots, y_n \right\rbrace$ be [samples](/D/samp) from [random variables](/D/rvar) $X$ and $Y$. Then, the [sample correlation coefficient](/D/corr-samp) $r_{xy}$ can be expressed in terms of the [standard scores](/D/z) of $x$ and $y$:
31+
32+
$$ \label{eq:corr-z}
33+
r_{xy} = \frac{1}{n-1} \sum_{i=1}^n z_i^{(x)} \cdot z_i^{(y)} = \frac{1}{n-1} \sum_{i=1}^n \left( \frac{x_i-\bar{x}}{s_x} \right) \left( \frac{y_i-\bar{y}}{s_y} \right)
34+
$$
35+
36+
where $\bar{x}$ and $\bar{y}$ are the [sample means](/D/mean-samp) and $s_x$ and $s_y$ are the [sample variances](/D/var-samp).
37+
38+
39+
**Proof:** The [sample correlation coefficient](/D/corr-samp) is defined as
40+
41+
$$ \label{eq:corr-samp}
42+
r_{xy} = \frac{\sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y})}{\sqrt{\sum_{i=1}^n (x_i-\bar{x})^2} \sqrt{\sum_{i=1}^n (y_i-\bar{y})^2}} \; .
43+
$$
44+
45+
Using the [sample variances](/D/var-samp) of $x$ and $y$, we can write:
46+
47+
$$ \label{eq:corr-z-s1}
48+
r_{xy} = \frac{\sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y})}{\sqrt{(n-1) s_x^2} \sqrt{(n-1) s_y^2}} \; .
49+
$$
50+
51+
Rearranging the terms, we arrive at:
52+
53+
$$ \label{eq:corr-z-s2}
54+
r_{xy} = \frac{1}{(n-1) \, s_x \, s_y} \sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y}) \; .
55+
$$
56+
57+
Further simplifying, the result is:
58+
59+
$$ \label{eq:corr-z-s3}
60+
r_{xy} = \frac{1}{n-1} \sum_{i=1}^n \left( \frac{x_i-\bar{x}}{s_x} \right) \left( \frac{y_i-\bar{y}}{s_y} \right) \; .
61+
$$

0 commit comments

Comments
 (0)