Skip to content

Commit 11db948

Browse files
authored
Merge pull request #136 from JoramSoch/master
added 2 definitions and 10 proofs
2 parents dac0e19 + c114107 commit 11db948

13 files changed

Lines changed: 1086 additions & 24 deletions

D/regline.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
layout: definition
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-10-27 07:30:00
9+
10+
title: "Regression line"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Simple linear regression"
14+
definition: "Regression line"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2021
19+
title: "Simple linear regression"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2021-10-27"
22+
url: "https://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line"
23+
24+
def_id: "D164"
25+
shortcut: "regline"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Definition:** Let there be a [simple linear regression with independent observations](/D/slr) using dependent variable $y$ and independent variable $x$:
31+
32+
$$ \label{eq:slr}
33+
y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2) \; .
34+
$$
35+
36+
Then, given some parameters $\beta_0, \beta_1 \in \mathbb{R}$, the set
37+
38+
$$ \label{eq:regline}
39+
L(\beta_0, \beta_1) = \left\lbrace (x,y) \in \mathbb{R}^2 \mid y = \beta_0 + \beta_1 x \right\rbrace
40+
$$
41+
42+
is called a "regression line" and the set
43+
44+
$$ \label{eq:regline-ols}
45+
L(\hat{\beta}_0, \hat{\beta}_1) = \left\lbrace (x,y) \in \mathbb{R}^2 \mid y = \hat{\beta}_0 + \hat{\beta}_1 x \right\rbrace
46+
$$
47+
48+
is called the "fitted regression line", with estimated regression coefficients $\hat{\beta}_0, \hat{\beta}_1$, e.g. obtained via [ordinary least squares](/P/slr-ols).

D/slr.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
layout: definition
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-10-27 07:07:00
9+
10+
title: "Simple linear regression"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Simple linear regression"
14+
definition: "Definition"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2021
19+
title: "Simple linear regression"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2021-10-27"
22+
url: "https://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line"
23+
24+
def_id: "D163"
25+
shortcut: "slr"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Definition:** Let $y$ and $x$ be two $n \times 1$ vectors.
31+
32+
Then, a statement asserting a linear relationship between $x$ and $y$
33+
34+
$$ \label{eq:slr-model}
35+
y = \beta_0 + \beta_1 x + \varepsilon \; ,
36+
$$
37+
38+
together with a statement asserting a [normal distribution](/D/mvn) for $\varepsilon$
39+
40+
$$ \label{eq:slr-noise}
41+
\varepsilon \sim \mathcal{N}(0, \sigma^2 V)
42+
$$
43+
44+
is called a univariate simple regression model or simply, "simple linear regression".
45+
46+
* $y$ is called "dependent variable", "measured data" or "signal";
47+
48+
* $x$ is called "independent variable", "predictor" or "covariate";
49+
50+
* $V$ is called "covariance matrix" or "covariance structure";
51+
52+
* $\beta_1$ is called "slope of the [regression line](/D/regline)";
53+
54+
* $\beta_0$ is called "intercept of the [regression line](/D/regline)";
55+
56+
* $\varepsilon$ is called "noise", "errors" or "error terms";
57+
58+
* $\sigma^2$ is called "noise variance" or "error variance";
59+
60+
* $n$ is the number of observations.
61+
62+
When the covariance structure $V$ is equal to the $n \times n$ identity matrix, this is called simple linear regression with independent and identically distributed (i.i.d.) observations:
63+
64+
$$ \label{eq:mlr-noise-iid}
65+
V = I_n \quad \Rightarrow \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n) \quad \Rightarrow \quad \varepsilon_i \overset{\text{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2) \; .
66+
$$
67+
68+
In this case, the linear regression model can also be written as
69+
70+
$$ \label{eq:slr-model-sum}
71+
y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2) \; .
72+
$$
73+
74+
Otherwise, it is called simple linear regression with correlated observations.

I/Table_of_Contents.md

Lines changed: 38 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -479,30 +479,44 @@ title: "Table of Contents"
479479
&emsp;&ensp; 1.2.12. **[Cross-validated log model evidence](/P/ugkv-cvlme)** <br>
480480
&emsp;&ensp; 1.2.13. **[Cross-validated log Bayes factor](/P/ugkv-cvlbf)** <br>
481481
&emsp;&ensp; 1.2.14. **[Expectation of cross-validated log Bayes factor](/P/ugkv-cvlbfmean)** <br>
482-
483-
1.3. Multiple linear regression <br>
484-
&emsp;&ensp; 1.3.1. *[Definition](/D/mlr)* <br>
485-
&emsp;&ensp; 1.3.2. **[Ordinary least squares](/P/mlr-ols)** (1) <br>
486-
&emsp;&ensp; 1.3.3. **[Ordinary least squares](/P/mlr-ols2)** (2) <br>
487-
&emsp;&ensp; 1.3.4. *[Total sum of squares](/D/tss)* <br>
488-
&emsp;&ensp; 1.3.5. *[Explained sum of squares](/D/ess)* <br>
489-
&emsp;&ensp; 1.3.6. *[Residual sum of squares](/D/rss)* <br>
490-
&emsp;&ensp; 1.3.7. **[Total, explained and residual sum of squares](/P/mlr-pss)** <br>
491-
&emsp;&ensp; 1.3.8. *[Estimation matrix](/D/emat)* <br>
492-
&emsp;&ensp; 1.3.9. *[Projection matrix](/D/pmat)* <br>
493-
&emsp;&ensp; 1.3.10. *[Residual-forming matrix](/D/rfmat)* <br>
494-
&emsp;&ensp; 1.3.11. **[Estimation, projection and residual-forming matrix](/P/mlr-mat)** <br>
495-
&emsp;&ensp; 1.3.12. **[Idempotence of projection and residual-forming matrix](/P/mlr-idem)** <br>
496-
&emsp;&ensp; 1.3.13. **[Weighted least squares](/P/mlr-wls)** (1) <br>
497-
&emsp;&ensp; 1.3.14. **[Weighted least squares](/P/mlr-wls2)** (2) <br>
498-
&emsp;&ensp; 1.3.15. **[Maximum likelihood estimation](/P/mlr-mle)** <br>
499-
500-
1.4. Bayesian linear regression <br>
501-
&emsp;&ensp; 1.4.1. **[Conjugate prior distribution](/P/blr-prior)** <br>
502-
&emsp;&ensp; 1.4.2. **[Posterior distribution](/P/blr-post)** <br>
503-
&emsp;&ensp; 1.4.3. **[Log model evidence](/P/blr-lme)** <br>
504-
&emsp;&ensp; 1.4.4. **[Posterior probability of alternative hypothesis](/P/blr-pp)** <br>
505-
&emsp;&ensp; 1.4.5. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)** <br>
482+
483+
1.3. Simple linear regression <br>
484+
&emsp;&ensp; 1.3.1. *[Definition](/D/slr)* <br>
485+
&emsp;&ensp; 1.3.2. *[Regression line](/D/regline)* <br>
486+
&emsp;&ensp; 1.3.3. **[Ordinary least squares](/P/slr-ols)** <br>
487+
&emsp;&ensp; 1.3.4. **[Expectation of estimates](/P/slr-olsmean)** <br>
488+
&emsp;&ensp; 1.3.5. **[Variance of estimates](/P/slr-olsvar)** <br>
489+
&emsp;&ensp; 1.3.6. **[Effects of mean-centering](/P/slr-meancent)** <br>
490+
&emsp;&ensp; 1.3.7. **[Regression line includes center of mass](/P/slr-comp)** <br>
491+
&emsp;&ensp; 1.3.8. **[Sum of residuals is zero](/P/slr-ressum)** <br>
492+
&emsp;&ensp; 1.3.9. **[Correlation with covariate is zero](/P/slr-rescorr)** <br>
493+
&emsp;&ensp; 1.3.10. **[Residual variance in terms of sample variance](/P/slr-vars)** <br>
494+
&emsp;&ensp; 1.3.11. **[Correlation coefficient in terms of slope estimate](/P/slr-corr)** <br>
495+
&emsp;&ensp; 1.3.12. **[Coefficient of determination in terms of correlation coefficient](/P/slr-rsq)** <br>
496+
497+
1.4. Multiple linear regression <br>
498+
&emsp;&ensp; 1.4.1. *[Definition](/D/mlr)* <br>
499+
&emsp;&ensp; 1.4.2. **[Ordinary least squares](/P/mlr-ols)** (1) <br>
500+
&emsp;&ensp; 1.4.3. **[Ordinary least squares](/P/mlr-ols2)** (2) <br>
501+
&emsp;&ensp; 1.4.4. *[Total sum of squares](/D/tss)* <br>
502+
&emsp;&ensp; 1.4.5. *[Explained sum of squares](/D/ess)* <br>
503+
&emsp;&ensp; 1.4.6. *[Residual sum of squares](/D/rss)* <br>
504+
&emsp;&ensp; 1.4.7. **[Total, explained and residual sum of squares](/P/mlr-pss)** <br>
505+
&emsp;&ensp; 1.4.8. *[Estimation matrix](/D/emat)* <br>
506+
&emsp;&ensp; 1.4.9. *[Projection matrix](/D/pmat)* <br>
507+
&emsp;&ensp; 1.4.10. *[Residual-forming matrix](/D/rfmat)* <br>
508+
&emsp;&ensp; 1.4.11. **[Estimation, projection and residual-forming matrix](/P/mlr-mat)** <br>
509+
&emsp;&ensp; 1.4.12. **[Idempotence of projection and residual-forming matrix](/P/mlr-idem)** <br>
510+
&emsp;&ensp; 1.4.13. **[Weighted least squares](/P/mlr-wls)** (1) <br>
511+
&emsp;&ensp; 1.4.14. **[Weighted least squares](/P/mlr-wls2)** (2) <br>
512+
&emsp;&ensp; 1.4.15. **[Maximum likelihood estimation](/P/mlr-mle)** <br>
513+
514+
1.5. Bayesian linear regression <br>
515+
&emsp;&ensp; 1.5.1. **[Conjugate prior distribution](/P/blr-prior)** <br>
516+
&emsp;&ensp; 1.5.2. **[Posterior distribution](/P/blr-post)** <br>
517+
&emsp;&ensp; 1.5.3. **[Log model evidence](/P/blr-lme)** <br>
518+
&emsp;&ensp; 1.5.4. **[Posterior probability of alternative hypothesis](/P/blr-pp)** <br>
519+
&emsp;&ensp; 1.5.5. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)** <br>
506520

507521
2. Multivariate normal data
508522

P/slr-comp.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-10-27 12:52:00
9+
10+
title: "The regression line goes through the center of mass point"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Simple linear regression"
14+
theorem: "Regression line includes center of mass"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2021
19+
title: "Simple linear regression"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2021-10-27"
22+
url: "https://en.wikipedia.org/wiki/Simple_linear_regression#Numerical_properties"
23+
24+
proof_id: "P275"
25+
shortcut: "slr-comp"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Theorem:** In [simple linear regression](/D/slr), the [regression line](/D/regline) estimated using [ordinary least squares](/P/slr-ols) includes the point $M(\bar{x},\bar{y})$.
31+
32+
**Proof:** The [fitted regression line](/D/regline) is described by the equation
33+
34+
$$ \label{eq:slr-ols-regline}
35+
y = \hat{\beta}_0 + \hat{\beta}_1 x \quad \text{where} \quad x,y \in \mathbb{R} \; .
36+
$$
37+
38+
Plugging in the coordinates of $M$ and the [ordinary least squares estimate of the intercept](/P/slr-ols), we obtain
39+
40+
$$ \label{eq:slr-ols}
41+
\begin{split}
42+
\bar{y} &= \hat{\beta}_0 + \hat{\beta}_1 \bar{x} \\
43+
\bar{y} &= \bar{y} - \hat{\beta}_1 \bar{x} + \hat{\beta}_1 \bar{x} \\
44+
\bar{y} &= \bar{y} \; .
45+
\end{split}
46+
$$
47+
48+
which is a true statement. Thus, the [regression line](/D/regline) goes through the center of mass point $(\bar{x},\bar{y})$, if [the model](/D/slr) includes an intercept term $\beta_0$.

P/slr-corr.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2021-10-27 14:58:00
9+
10+
title: "Relationship between correlation coefficient and slope estimate in simple linear regression"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Simple linear regression"
14+
theorem: "Correlation coefficient in terms of slope estimate"
15+
16+
sources:
17+
- authors: "Penny, William"
18+
year: 2006
19+
title: "Relation to correlation"
20+
in: "Mathematics for Brain Imaging"
21+
pages: "ch. 1.2.3, p. 18, eq. 1.27"
22+
url: "https://ueapsylabs.co.uk/sites/wpenny/mbi/mbi_course.pdf"
23+
- authors: "Wikipedia"
24+
year: 2021
25+
title: "Simple linear regression"
26+
in: "Wikipedia, the free encyclopedia"
27+
pages: "retrieved on 2021-10-27"
28+
url: "https://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line"
29+
30+
proof_id: "P279"
31+
shortcut: "slr-corr"
32+
username: "JoramSoch"
33+
---
34+
35+
36+
**Theorem:** Assume a [simple linear regression model](/D/slr) with independent observations
37+
38+
$$ \label{eq:slr}
39+
y = \beta_0 + \beta_1 x + \varepsilon, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2), \; i = 1,\ldots,n
40+
$$
41+
42+
and consider estimation using [ordinary least squares](/P/slr-ols). Then, [correlation coefficient](/D/corr) and the estimated value of the [slope parameter](/D/slr) are related to each other via the sample [standard deviations](/D/std):
43+
44+
$$ \label{eq:slr-corr}
45+
r_{xy} = \frac{s_x}{s_y} \, \hat{\beta}_1 \; .
46+
$$
47+
48+
49+
**Proof:** The [ordinary least squares estimate of the slope](/P/slr-ols) is given by
50+
51+
$$ \label{eq:slr-ols-sl}
52+
\hat{\beta}_1 = \frac{s_{xy}}{s_x^2} \; .
53+
$$
54+
55+
Using the [relationship between covariance and correlation](/D/cov-corr)
56+
57+
$$ \label{eq:cov-corr}
58+
\mathrm{Cov}(X,Y) = \sigma_X \, \mathrm{Corr}(X,Y) \, \sigma_Y
59+
$$
60+
61+
which also holds for sample [correlation](/D/corr) and [sample covariance](/D/cov-samp)
62+
63+
$$ \label{eq:cov-corr-samp}
64+
s_{xy} = s_x \, r_{xy} \, s_y \; ,
65+
$$
66+
67+
we get the final result:
68+
69+
$$ \label{eq:slr-corr-qed}
70+
\begin{split}
71+
\hat{\beta}_1 &= \frac{s_{xy}}{s_x^2} \\
72+
\hat{\beta}_1 &= \frac{s_x \, r_{xy} \, s_y}{s_x^2} \\
73+
\hat{\beta}_1 &= \frac{s_y}{s_x} \, r_{xy} \\
74+
\Leftrightarrow \quad r_{xy} &= \frac{s_x}{s_y} \, \hat{\beta}_1 \; .
75+
\end{split}
76+
$$

0 commit comments

Comments
 (0)