StatProofBook
diff --git a/‎D/regline.md‎
Lines changed: 48 additions & 0 deletions b/‎D/regline.md‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎D/slr.md‎
Lines changed: 74 additions & 0 deletions b/‎D/slr.md‎
Lines changed: 74 additions & 0 deletions
diff --git a/‎I/Table_of_Contents.md‎
Lines changed: 38 additions & 24 deletions b/‎I/Table_of_Contents.md‎
Lines changed: 38 additions & 24 deletions
diff --git a/‎P/slr-comp.md‎
Lines changed: 48 additions & 0 deletions b/‎P/slr-comp.md‎
Lines changed: 48 additions & 0 deletions
diff --git a/‎P/slr-corr.md‎
Lines changed: 76 additions & 0 deletions b/‎P/slr-corr.md‎
Lines changed: 76 additions & 0 deletions
@@ -0,0 +1,48 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-10-27 07:30:00
+
+title: "Regression line"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Simple linear regression"
+definition: "Regression line"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2021
+    title: "Simple linear regression"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2021-10-27"
+    url: "https://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line"
+
+def_id: "D164"
+shortcut: "regline"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let there be a [simple linear regression with independent observations](/D/slr) using dependent variable $y$ and independent variable $x$:
+
+$$ \label{eq:slr}
+y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2) \; .
+$$
+
+Then, given some parameters $\beta_0, \beta_1 \in \mathbb{R}$, the set
+
+$$ \label{eq:regline}
+L(\beta_0, \beta_1) = \left\lbrace (x,y) \in \mathbb{R}^2 \mid y = \beta_0 + \beta_1 x \right\rbrace
+$$
+
+is called a "regression line" and the set
+
+$$ \label{eq:regline-ols}
+L(\hat{\beta}_0, \hat{\beta}_1) = \left\lbrace (x,y) \in \mathbb{R}^2 \mid y = \hat{\beta}_0 + \hat{\beta}_1 x \right\rbrace
+$$
+
+is called the "fitted regression line", with estimated regression coefficients $\hat{\beta}_0, \hat{\beta}_1$, e.g. obtained via [ordinary least squares](/P/slr-ols).
@@ -0,0 +1,74 @@
+---
+layout: definition
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-10-27 07:07:00
+
+title: "Simple linear regression"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Simple linear regression"
+definition: "Definition"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2021
+    title: "Simple linear regression"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2021-10-27"
+    url: "https://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line"
+
+def_id: "D163"
+shortcut: "slr"
+username: "JoramSoch"
+---
+
+
+**Definition:** Let $y$ and $x$ be two $n \times 1$ vectors.
+
+Then, a statement asserting a linear relationship between $x$ and $y$
+
+$$ \label{eq:slr-model}
+y = \beta_0 + \beta_1 x + \varepsilon \; ,
+$$
+
+together with a statement asserting a [normal distribution](/D/mvn) for $\varepsilon$
+
+$$ \label{eq:slr-noise}
+\varepsilon \sim \mathcal{N}(0, \sigma^2 V)
+$$
+
+is called a univariate simple regression model or simply, "simple linear regression".
+
+* $y$ is called "dependent variable", "measured data" or "signal";
+
+* $x$ is called "independent variable", "predictor" or "covariate";
+
+* $V$ is called "covariance matrix" or "covariance structure";
+
+* $\beta_1$ is called "slope of the [regression line](/D/regline)";
+
+* $\beta_0$ is called "intercept of the [regression line](/D/regline)";
+
+* $\varepsilon$ is called "noise", "errors" or "error terms";
+
+* $\sigma^2$ is called "noise variance" or "error variance";
+
+* $n$ is the number of observations.
+
+When the covariance structure $V$ is equal to the $n \times n$ identity matrix, this is called simple linear regression with independent and identically distributed (i.i.d.) observations:
+
+$$ \label{eq:mlr-noise-iid}
+V = I_n \quad \Rightarrow \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n) \quad \Rightarrow \quad \varepsilon_i \overset{\text{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2) \; .
+$$
+
+In this case, the linear regression model can also be written as
+
+$$ \label{eq:slr-model-sum}
+y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2) \; .
+$$
+
+Otherwise, it is called simple linear regression with correlated observations.
@@ -479,30 +479,44 @@ title: "Table of Contents"
    &emsp;&ensp; 1.2.12. **[Cross-validated log model evidence](/P/ugkv-cvlme)** <br>
    &emsp;&ensp; 1.2.13. **[Cross-validated log Bayes factor](/P/ugkv-cvlbf)** <br>
    &emsp;&ensp; 1.2.14. **[Expectation of cross-validated log Bayes factor](/P/ugkv-cvlbfmean)** <br>
-
-   1.3. Multiple linear regression <br>
-   &emsp;&ensp; 1.3.1. *[Definition](/D/mlr)* <br>
-   &emsp;&ensp; 1.3.2. **[Ordinary least squares](/P/mlr-ols)** (1) <br>
-   &emsp;&ensp; 1.3.3. **[Ordinary least squares](/P/mlr-ols2)** (2) <br>
-   &emsp;&ensp; 1.3.4. *[Total sum of squares](/D/tss)* <br>
-   &emsp;&ensp; 1.3.5. *[Explained sum of squares](/D/ess)* <br>
-   &emsp;&ensp; 1.3.6. *[Residual sum of squares](/D/rss)* <br>
-   &emsp;&ensp; 1.3.7. **[Total, explained and residual sum of squares](/P/mlr-pss)** <br>
-   &emsp;&ensp; 1.3.8. *[Estimation matrix](/D/emat)* <br>
-   &emsp;&ensp; 1.3.9. *[Projection matrix](/D/pmat)* <br>
-   &emsp;&ensp; 1.3.10. *[Residual-forming matrix](/D/rfmat)* <br>
-   &emsp;&ensp; 1.3.11. **[Estimation, projection and residual-forming matrix](/P/mlr-mat)** <br>
-   &emsp;&ensp; 1.3.12. **[Idempotence of projection and residual-forming matrix](/P/mlr-idem)** <br>
-   &emsp;&ensp; 1.3.13. **[Weighted least squares](/P/mlr-wls)** (1) <br>
-   &emsp;&ensp; 1.3.14. **[Weighted least squares](/P/mlr-wls2)** (2) <br>
-   &emsp;&ensp; 1.3.15. **[Maximum likelihood estimation](/P/mlr-mle)** <br>
-   
-   1.4. Bayesian linear regression <br>
-   &emsp;&ensp; 1.4.1. **[Conjugate prior distribution](/P/blr-prior)** <br>
-   &emsp;&ensp; 1.4.2. **[Posterior distribution](/P/blr-post)** <br>
-   &emsp;&ensp; 1.4.3. **[Log model evidence](/P/blr-lme)** <br>
-   &emsp;&ensp; 1.4.4. **[Posterior probability of alternative hypothesis](/P/blr-pp)** <br>
-   &emsp;&ensp; 1.4.5. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)** <br>
+   
+   1.3. Simple linear regression <br>
+   &emsp;&ensp; 1.3.1. *[Definition](/D/slr)* <br>
+   &emsp;&ensp; 1.3.2. *[Regression line](/D/regline)* <br>
+   &emsp;&ensp; 1.3.3. **[Ordinary least squares](/P/slr-ols)** <br>
+   &emsp;&ensp; 1.3.4. **[Expectation of estimates](/P/slr-olsmean)** <br>
+   &emsp;&ensp; 1.3.5. **[Variance of estimates](/P/slr-olsvar)** <br>
+   &emsp;&ensp; 1.3.6. **[Effects of mean-centering](/P/slr-meancent)** <br>
+   &emsp;&ensp; 1.3.7. **[Regression line includes center of mass](/P/slr-comp)** <br>
+   &emsp;&ensp; 1.3.8. **[Sum of residuals is zero](/P/slr-ressum)** <br>
+   &emsp;&ensp; 1.3.9. **[Correlation with covariate is zero](/P/slr-rescorr)** <br>
+   &emsp;&ensp; 1.3.10. **[Residual variance in terms of sample variance](/P/slr-vars)** <br>
+   &emsp;&ensp; 1.3.11. **[Correlation coefficient in terms of slope estimate](/P/slr-corr)** <br>
+   &emsp;&ensp; 1.3.12. **[Coefficient of determination in terms of correlation coefficient](/P/slr-rsq)** <br>
+
+   1.4. Multiple linear regression <br>
+   &emsp;&ensp; 1.4.1. *[Definition](/D/mlr)* <br>
+   &emsp;&ensp; 1.4.2. **[Ordinary least squares](/P/mlr-ols)** (1) <br>
+   &emsp;&ensp; 1.4.3. **[Ordinary least squares](/P/mlr-ols2)** (2) <br>
+   &emsp;&ensp; 1.4.4. *[Total sum of squares](/D/tss)* <br>
+   &emsp;&ensp; 1.4.5. *[Explained sum of squares](/D/ess)* <br>
+   &emsp;&ensp; 1.4.6. *[Residual sum of squares](/D/rss)* <br>
+   &emsp;&ensp; 1.4.7. **[Total, explained and residual sum of squares](/P/mlr-pss)** <br>
+   &emsp;&ensp; 1.4.8. *[Estimation matrix](/D/emat)* <br>
+   &emsp;&ensp; 1.4.9. *[Projection matrix](/D/pmat)* <br>
+   &emsp;&ensp; 1.4.10. *[Residual-forming matrix](/D/rfmat)* <br>
+   &emsp;&ensp; 1.4.11. **[Estimation, projection and residual-forming matrix](/P/mlr-mat)** <br>
+   &emsp;&ensp; 1.4.12. **[Idempotence of projection and residual-forming matrix](/P/mlr-idem)** <br>
+   &emsp;&ensp; 1.4.13. **[Weighted least squares](/P/mlr-wls)** (1) <br>
+   &emsp;&ensp; 1.4.14. **[Weighted least squares](/P/mlr-wls2)** (2) <br>
+   &emsp;&ensp; 1.4.15. **[Maximum likelihood estimation](/P/mlr-mle)** <br>
+   
+   1.5. Bayesian linear regression <br>
+   &emsp;&ensp; 1.5.1. **[Conjugate prior distribution](/P/blr-prior)** <br>
+   &emsp;&ensp; 1.5.2. **[Posterior distribution](/P/blr-post)** <br>
+   &emsp;&ensp; 1.5.3. **[Log model evidence](/P/blr-lme)** <br>
+   &emsp;&ensp; 1.5.4. **[Posterior probability of alternative hypothesis](/P/blr-pp)** <br>
+   &emsp;&ensp; 1.5.5. **[Posterior credibility region excluding null hypothesis](/P/blr-pcr)** <br>
 
 2. Multivariate normal data
 
 
@@ -0,0 +1,48 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-10-27 12:52:00
+
+title: "The regression line goes through the center of mass point"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Simple linear regression"
+theorem: "Regression line includes center of mass"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2021
+    title: "Simple linear regression"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2021-10-27"
+    url: "https://en.wikipedia.org/wiki/Simple_linear_regression#Numerical_properties"
+
+proof_id: "P275"
+shortcut: "slr-comp"
+username: "JoramSoch"
+---
+
+
+**Theorem:** In [simple linear regression](/D/slr), the [regression line](/D/regline) estimated using [ordinary least squares](/P/slr-ols) includes the point $M(\bar{x},\bar{y})$.
+
+**Proof:** The [fitted regression line](/D/regline) is described by the equation
+
+$$ \label{eq:slr-ols-regline}
+y = \hat{\beta}_0 + \hat{\beta}_1 x \quad \text{where} \quad x,y \in \mathbb{R} \; .
+$$
+
+Plugging in the coordinates of $M$ and the [ordinary least squares estimate of the intercept](/P/slr-ols), we obtain
+
+$$ \label{eq:slr-ols}
+\begin{split}
+\bar{y} &= \hat{\beta}_0 + \hat{\beta}_1 \bar{x} \\
+\bar{y} &= \bar{y} - \hat{\beta}_1 \bar{x} + \hat{\beta}_1 \bar{x} \\
+\bar{y} &= \bar{y} \; .
+\end{split}
+$$
+
+which is a true statement. Thus, the [regression line](/D/regline) goes through the center of mass point $(\bar{x},\bar{y})$, if [the model](/D/slr) includes an intercept term $\beta_0$.
@@ -0,0 +1,76 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-10-27 14:58:00
+
+title: "Relationship between correlation coefficient and slope estimate in simple linear regression"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Simple linear regression"
+theorem: "Correlation coefficient in terms of slope estimate"
+
+sources:
+  - authors: "Penny, William"
+    year: 2006
+    title: "Relation to correlation"
+    in: "Mathematics for Brain Imaging"
+    pages: "ch. 1.2.3, p. 18, eq. 1.27"
+    url: "https://ueapsylabs.co.uk/sites/wpenny/mbi/mbi_course.pdf"
+  - authors: "Wikipedia"
+    year: 2021
+    title: "Simple linear regression"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2021-10-27"
+    url: "https://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line"
+
+proof_id: "P279"
+shortcut: "slr-corr"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Assume a [simple linear regression model](/D/slr) with independent observations
+
+$$ \label{eq:slr}
+y = \beta_0 + \beta_1 x + \varepsilon, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2), \; i = 1,\ldots,n
+$$
+
+and consider estimation using [ordinary least squares](/P/slr-ols). Then, [correlation coefficient](/D/corr) and the estimated value of the [slope parameter](/D/slr) are related to each other via the sample [standard deviations](/D/std):
+
+$$ \label{eq:slr-corr}
+r_{xy} = \frac{s_x}{s_y} \, \hat{\beta}_1 \; .
+$$
+
+
+**Proof:** The [ordinary least squares estimate of the slope](/P/slr-ols) is given by
+
+$$ \label{eq:slr-ols-sl}
+\hat{\beta}_1 = \frac{s_{xy}}{s_x^2} \; .
+$$
+
+Using the [relationship between covariance and correlation](/D/cov-corr)
+
+$$ \label{eq:cov-corr}
+\mathrm{Cov}(X,Y) = \sigma_X \, \mathrm{Corr}(X,Y) \, \sigma_Y
+$$
+
+which also holds for sample [correlation](/D/corr) and [sample covariance](/D/cov-samp)
+
+$$ \label{eq:cov-corr-samp}
+s_{xy} = s_x \, r_{xy} \, s_y \; ,
+$$
+
+we get the final result:
+
+$$ \label{eq:slr-corr-qed}
+\begin{split}
+\hat{\beta}_1 &= \frac{s_{xy}}{s_x^2} \\
+\hat{\beta}_1 &= \frac{s_x \, r_{xy} \, s_y}{s_x^2} \\
+\hat{\beta}_1 &= \frac{s_y}{s_x} \, r_{xy} \\
+\Leftrightarrow \quad r_{xy} &= \frac{s_x}{s_y} \, \hat{\beta}_1 \; .
+\end{split}
+$$