added 5 proofs

JoramSoch · web-flow · commit 1267a4fe8524 · 2021-11-09T15:36:25.000+01:00
diff --git a/P/slr-mat.md b/P/slr-mat.md
@@ -0,0 +1,108 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-11-09 15:19:00
+
+title: "Transformation matrices for simple linear regression"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Simple linear regression"
+theorem: "Transformation matrices"
+
+sources:
+
+proof_id: "P285"
+shortcut: "slr-mat"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Under [ordinary least squares](/P/slr-ols) for [simple linear regression](/D/slr), [estimation](/D/emat), [projection](/D/pmat) and [residual-forming](/D/rfmat) matrices are given by
+
+$$ \label{eq:slr-mat}
+\begin{split}
+E &= \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} (x^\mathrm{T} x/n) \, 1_n^\mathrm{T} - \bar{x} \, x^\mathrm{T} \\ - \bar{x} \, 1_n^\mathrm{T} + x^\mathrm{T} \end{matrix} \right] \\
+P &= \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} (x^\mathrm{T} x/n) - 2 \bar{x} x_1 + x_1^2 & \cdots & (x^\mathrm{T} x/n) - \bar{x} (x_1 + x_n) + x_1 x_n \\ \vdots & \ddots & \vdots \\ (x^\mathrm{T} x/n) - \bar{x} (x_1 + x_n) + x_1 x_n & \cdots & (x^\mathrm{T} x/n) - 2 \bar{x} x_n + x_n^2 \end{matrix} \right] \\
+R &= \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} (n-1) (x^\mathrm{T} x/n) + \bar{x} (2 x_1 - n\bar{x}) - x_1^2 & \cdots & -(x^\mathrm{T} x/n) + \bar{x} (x_1 + x_n) - x_1 x_n \\ \vdots & \ddots & \vdots \\ -(x^\mathrm{T} x/n) + \bar{x} (x_1 + x_n) - x_1 x_n & \cdots &  (n-1) (x^\mathrm{T} x/n) + \bar{x} (2 x_n - n\bar{x}) - x_n^2 \end{matrix} \right]
+\end{split}
+$$
+
+where $1_n$ is an $n \times 1$ vector of ones, $x$ is the $n \times 1$ single predictor variable, $\bar{x}$ is the [sample mean](/D/mean-samp) of $x$ and $s_x^2$ is the [sample variance](/D/var-samp) of $x$.
+
+
+**Proof:** [Simple linear regression is a special case of multiple linear regression](/P/slr-mlr) with
+
+$$ \label{eq:slr-mlr}
+X = \left[ 1_n, \, x \right] \quad \text{and} \quad \beta = \left[ \begin{matrix} \beta_0 \\ \beta_1 \end{matrix} \right] \; ,
+$$
+
+such that the simple linear regression model can also be written as
+
+$$ \label{eq:mlr}
+y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n) \; .
+$$
+
+Moreover, we [note the following equality](/P/slr-olsdist):
+
+$$ \label{eq:b-est-cov-den}
+x^\mathrm{T} x - n\bar{x}^2 = (n-1) \, s_x^2 \; .
+$$
+
+<br>
+1) The [estimation matrix is given by](/P/mlr-mat)
+
+$$ \label{eq:E}
+E = (X^\mathrm{T} X)^{-1} X^\mathrm{T}
+$$
+
+which is a $2 \times n$ matrix and can be reformulated as follows:
+
+$$ \label{eq:E-qed}
+\begin{split}
+E &= (X^\mathrm{T} X)^{-1} X^\mathrm{T} \\
+&= \left( \left[ \begin{matrix} 1_n^\mathrm{T} \\ x^\mathrm{T} \end{matrix} \right] \left[ 1_n, \, x \right] \right)^{-1} \left[ \begin{matrix} 1_n^\mathrm{T} \\ x^\mathrm{T} \end{matrix} \right] \\
+&= \left( \left[ \begin{matrix} n & n\bar{x} \\ n\bar{x} & x^\mathrm{T} x \end{matrix} \right] \right)^{-1} \left[ \begin{matrix} 1_n^\mathrm{T} \\ x^\mathrm{T} \end{matrix} \right] \\
+&= \frac{1}{n x^\mathrm{T} x - (n\bar{x})^2} \left[ \begin{matrix} x^\mathrm{T} x & -n\bar{x} \\ -n\bar{x} & n \end{matrix} \right] \left[ \begin{matrix} 1_n^\mathrm{T} \\ x^\mathrm{T} \end{matrix} \right] \\
+&= \frac{1}{x^\mathrm{T} x - n\bar{x}^2} \left[ \begin{matrix} x^\mathrm{T} x/n & -\bar{x} \\ -\bar{x} & 1 \end{matrix} \right] \left[ \begin{matrix} 1_n^\mathrm{T} \\ x^\mathrm{T} \end{matrix} \right] \\
+&= \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} (x^\mathrm{T} x/n) \, 1_n^\mathrm{T} - \bar{x} \, x^\mathrm{T} \\ - \bar{x} \, 1_n^\mathrm{T} + x^\mathrm{T} \end{matrix} \right] \; .
+\end{split}
+$$
+
+<br>
+2) The [projection matrix is given by](/P/mlr-mat)
+
+$$ \label{eq:P}
+P = X (X^\mathrm{T} X)^{-1} X^\mathrm{T} = X \, E
+$$
+
+which is an $n \times n$ matrix and can be reformulated as follows:
+
+$$ \label{eq:P-qed}
+\begin{split}
+P &= X \, E = \left[ \begin{matrix} 1_n & x \end{matrix} \right] \left[ \begin{matrix} e_1 \\ e_2 \end{matrix} \right] \\
+&= \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} 1 & x_1 \\ \vdots & \vdots \\ 1 & x_n \end{matrix} \right] \left[ \begin{matrix} (x^\mathrm{T} x/n) - \bar{x} x_1 & \cdots & (x^\mathrm{T} x/n) - \bar{x} x_n \\ -\bar{x} + x_1 & \cdots & -\bar{x} + x_n \end{matrix} \right] \\
+&= \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} (x^\mathrm{T} x/n) - 2 \bar{x} x_1 + x_1^2 & \cdots & (x^\mathrm{T} x/n) - \bar{x} (x_1 + x_n) + x_1 x_n \\ \vdots & \ddots & \vdots \\ (x^\mathrm{T} x/n) - \bar{x} (x_1 + x_n) + x_1 x_n & \cdots & (x^\mathrm{T} x/n) - 2 \bar{x} x_n + x_n^2 \end{matrix} \right] \; .
+\end{split}
+$$
+
+<br>
+3) The [residual-forming matrix is given by](/P/mlr-mat)
+
+$$ \label{eq:R}
+R = I_n - X (X^\mathrm{T} X)^{-1} X^\mathrm{T} = I_n - P
+$$
+
+which also is an $n \times n$ matrix and can be reformulated as follows:
+
+$$ \label{eq:R-qed}
+\begin{split}
+R &= I_n - P = \left[ \begin{matrix} 1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 1 \end{matrix} \right] - \left[ \begin{matrix} p_{11} & \cdots & p_{1n} \\ \vdots & \ddots & \vdots \\ p_{n1} & \cdots & p_{nn} \end{matrix} \right] \\
+&= \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} x^\mathrm{T} x - n\bar{x}^2 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & x^\mathrm{T} x - n\bar{x}^2 \end{matrix} \right] \\
+&- \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} (x^\mathrm{T} x/n) - 2 \bar{x} x_1 + x_1^2 & \cdots & (x^\mathrm{T} x/n) - \bar{x} (x_1 + x_n) + x_1 x_n \\ \vdots & \ddots & \vdots \\ (x^\mathrm{T} x/n) - \bar{x} (x_1 + x_n) + x_1 x_n & \cdots & (x^\mathrm{T} x/n) - 2 \bar{x} x_n + x_n^2 \end{matrix} \right] \\
+&= \frac{1}{(n-1)\,s_x^2} \left[ \begin{matrix} (n-1) (x^\mathrm{T} x/n) + \bar{x} (2 x_1 - n\bar{x}) - x_1^2 & \cdots & -(x^\mathrm{T} x/n) + \bar{x} (x_1 + x_n) - x_1 x_n \\ \vdots & \ddots & \vdots \\ -(x^\mathrm{T} x/n) + \bar{x} (x_1 + x_n) - x_1 x_n & \cdots &  (n-1) (x^\mathrm{T} x/n) + \bar{x} (2 x_n - n\bar{x}) - x_n^2 \end{matrix} \right] \; .
+\end{split}
+$$
diff --git a/P/slr-mlr.md b/P/slr-mlr.md
@@ -0,0 +1,58 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-11-09 07:57:00
+
+title: "Simple linear regression is a special case of multiple linear regression"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Simple linear regression"
+theorem: "Special case of multiple linear regression"
+
+sources:
+
+proof_id: "P281"
+shortcut: "slr-mlr"
+username: "JoramSoch"
+---
+
+
+**Theorem:** [Simple linear regression](/D/slr) is a special case of [multiple linear regression](/D/mlr) with design matrix $X$ and regression coefficients $\beta$
+
+$$ \label{eq:slr-mlr}
+X = \left[ 1_n, \, x \right] \quad \text{and} \quad \beta = \left[ \begin{matrix} \beta_0 \\ \beta_1 \end{matrix} \right]
+$$
+
+where $1_n$ is an $n \times 1$ vector of ones, $x$ is the $n \times 1$ single predictor variable, $\beta_0$ is the intercept and $\beta_1$ is the slope of the [regression line](/D/regline).
+
+
+**Proof:** Without loss of generality, consider the [simple linear regression case with uncorrelated errors](/D/slr):
+
+$$ \label{eq:slr}
+y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2) \; .
+$$
+
+In matrix notation and using the [multivariate normal distribution](/D/mvn), this can also be written as
+
+$$ \label{eq:slr-mlr-s1}
+\begin{split}
+y &= \beta_0 1_n + \beta_1 x + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, I_n) \\
+y &= \left[ 1_n, \, x \right] \left[ \begin{matrix} \beta_0 \\ \beta_1 \end{matrix} \right] + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, I_n) \; .
+\end{split}
+$$
+
+Comparing with the [multiple linear regression equations for uncorrelated errors](/D/mlr), we finally note:
+
+$$ \label{eq:slr-mlr-s3}
+y = X\beta + \varepsilon \quad \text{with} \quad X = \left[ 1_n, \, x \right] \quad \text{and} \quad \beta = \left[ \begin{matrix} \beta_0 \\ \beta_1 \end{matrix} \right] \; .
+$$
+
+In the [case of correlated observations](/D/slr), the [error distribution changes to](/D/mlr):
+
+$$ \label{eq:mlr-noise}
+\varepsilon \sim \mathcal{N}(0, \sigma^2 V) \; .
+$$
diff --git a/P/slr-olsdist.md b/P/slr-olsdist.md
@@ -0,0 +1,108 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-11-09 09:09:00
+
+title: "Distribution of parameter estimates for simple linear regression"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Simple linear regression"
+theorem: "Distribution of estimates"
+
+sources:
+  - authors: "Wikipedia"
+    year: 2021
+    title: "Proofs involving ordinary least squares"
+    in: "Wikipedia, the free encyclopedia"
+    pages: "retrieved on 2021-11-09"
+    url: "https://en.wikipedia.org/wiki/Proofs_involving_ordinary_least_squares#Unbiasedness_and_variance_of_%7F'%22%60UNIQ--postMath-00000037-QINU%60%22'%7F"
+
+proof_id: "P282"
+shortcut: "slr-olsdist"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Assume a [simple linear regression model](/D/slr) with independent observations
+
+$$ \label{eq:slr}
+y = \beta_0 + \beta_1 x + \varepsilon, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2)
+$$
+
+and consider estimation using [ordinary least squares](/P/slr-ols). Then, the estimated parameters are [normally distributed](/D/mvn) as
+
+$$ \label{eq:slr-olsdist}
+\left[ \begin{matrix} \hat{\beta}_0 \\ \hat{\beta}_1 \end{matrix} \right] \sim \mathcal{N}\left( \left[ \begin{matrix} \beta_0 \\ \beta_1 \end{matrix} \right], \, \frac{\sigma^2}{(n-1) \, s_x^2} \cdot \left[ \begin{matrix} x^\mathrm{T}x/n & -\bar{x} \\ -\bar{x} & 1 \end{matrix} \right] \right)
+$$
+
+where $s_x^2$ is the [sample variance](/D/var-samp) of $x$.
+
+
+**Proof:** [Simple linear regression is a special case of multiple linear regression](/P/slr-mlr) with
+
+$$ \label{eq:slr-mlr}
+X = \left[ 1_n, \, x \right] \quad \text{and} \quad \beta = \left[ \begin{matrix} \beta_0 \\ \beta_1 \end{matrix} \right] \; ,
+$$
+
+such that \eqref{eq:slr} can also be written as
+
+$$ \label{eq:mlr}
+y = X\beta + \varepsilon, \; \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n)
+$$
+
+and [ordinary least sqaures estimates](/P/mlr-ols) are given by
+
+$$ \label{eq:mlr-ols}
+\hat{\beta} = (X^\mathrm{T} X)^{-1} X^\mathrm{T} y \; .
+$$
+
+From \eqref{eq:mlr} and the [linear transformation theorem for the multivariate normal distribution](/P/mvn-ltt), it follows that
+
+$$ \label{eq:y-dist}
+y \sim \mathcal{N}\left( X\beta, \, \sigma^2 I_n \right) \; .
+$$
+
+From \eqref{eq:mlr-ols}, in combination with \eqref{eq:y-dist} and the [transformation theorem](/P/mvn-ltt), it follows that
+
+$$ \label{eq:b-est-dist}
+\begin{split}
+\hat{\beta} &\sim \mathcal{N}\left( (X^\mathrm{T} X)^{-1} X^\mathrm{T} X\beta, \, \sigma^2 (X^\mathrm{T} X)^{-1} X^\mathrm{T} I_n X (X^\mathrm{T} X)^{-1} \right) \\
+&\sim \mathcal{N}\left( \beta, \, \sigma^2 (X^\mathrm{T} X)^{-1} \right) \; .
+\end{split}
+$$
+
+Applying \eqref{eq:slr-mlr}, the [covariance matrix](/D/mvn) can be further developed as follows:
+
+$$ \label{eq:b-est-cov}
+\begin{split}
+\sigma^2 (X^\mathrm{T} X)^{-1} &= \sigma^2 \left( \left[ \begin{matrix} 1_n^\mathrm{T} \\ x^\mathrm{T} \end{matrix} \right] \left[ 1_n, \, x \right] \right)^{-1} \\
+&= \sigma^2 \left( \left[ \begin{matrix} n & n\bar{x} \\ n\bar{x} & x^\mathrm{T} x \end{matrix} \right] \right)^{-1} \\
+&= \frac{\sigma^2}{n x^\mathrm{T} x - (n\bar{x})^2} \left[ \begin{matrix} x^\mathrm{T} x & -n\bar{x} \\ -n\bar{x} & n \end{matrix} \right] \\
+&= \frac{\sigma^2}{x^\mathrm{T} x - n\bar{x}^2} \left[ \begin{matrix} x^\mathrm{T} x/n & -\bar{x} \\ -\bar{x} & 1 \end{matrix} \right] \; .
+\end{split}
+$$
+
+Note that the denominator in the first factor is equal to
+
+$$ \label{eq:b-est-cov-den}
+\begin{split}
+x^\mathrm{T} x - n\bar{x}^2 &= x^\mathrm{T} x - 2 n\bar{x}^2 + n\bar{x}^2 \\
+&= \sum_{i=1}^{n} x_i^2 - 2 n \bar{x} \frac{1}{n} \sum_{i=1}^{n} x_i + \sum_{i=1}^{n} \bar{x}^2 \\
+&= \sum_{i=1}^{n} x_i^2 - 2 \sum_{i=1}^{n} x_i \bar{x} + \sum_{i=1}^{n} \bar{x}^2 \\
+&= \sum_{i=1}^{n} \left( x_i^2 - 2 x_i \bar{x} + \bar{x}^2 \right) \\
+&= \sum_{i=1}^{n} \left( x_i^2 - \bar{x} \right)^2 \\
+&= (n-1) \, s_x^2 \; .
+\end{split}
+$$
+
+Thus, combining \eqref{eq:b-est-dist}, \eqref{eq:b-est-cov} and \eqref{eq:b-est-cov-den}, we have
+
+$$ \label{eq:slr-olsdist-qed}
+\hat{\beta} \sim \mathcal{N}\left( \beta, \, \frac{\sigma^2}{(n-1) \, s_x^2} \cdot \left[ \begin{matrix} x^\mathrm{T}x/n & -\bar{x} \\ -\bar{x} & 1 \end{matrix} \right] \right)
+$$
+
+which is equivalent to equation \eqref{eq:slr-olsdist}.
diff --git a/P/slr-proj.md b/P/slr-proj.md
@@ -0,0 +1,103 @@
+---
+layout: proof
+mathjax: true
+
+author: "Joram Soch"
+affiliation: "BCCN Berlin"
+e_mail: "joram.soch@bccn-berlin.de"
+date: 2021-11-09 10:16:00
+
+title: "Projection of a data point to the regression line"
+chapter: "Statistical Models"
+section: "Univariate normal data"
+topic: "Simple linear regression"
+theorem: "Projection of data point to regression line"
+
+sources:
+  - authors: "Penny, William"
+    year: 2006
+    title: "Projects"
+    in: "Mathematics for Brain Imaging"
+    pages: "ch. 1.4.10, pp. 34-35, eqs. 1.87/1.88"
+    url: "https://ueapsylabs.co.uk/sites/wpenny/mbi/mbi_course.pdf"
+
+proof_id: "P283"
+shortcut: "slr-proj"
+username: "JoramSoch"
+---
+
+
+**Theorem:** Consider [simple linear regression](/D/slr) and an [estimated regression line](/D/regline) specified by
+
+$$ \label{eq:slr-regline}
+y = \hat{\beta}_0 + \hat{\beta}_1 x \quad \text{where} \quad x,y \in \mathbb{R} \; .
+$$
+
+For any given data point $O(x_o \vert y_o)$, the point on the regression line $P(x_p \vert y_p)$ that is closest to this data point is given by:
+
+$$ \label{eq:slr-proj}
+P\left(w \mid \hat{\beta}_0 + \hat{\beta}_1 w\right) \quad \text{with} \quad w = \frac{x_0 + (y_o - \hat{\beta}_0) \hat{\beta}_1}{1 + \hat{\beta}_1^2} \; .
+$$
+
+
+**Proof:** The intersection point of the regression line with the y-axis is
+
+$$ \label{eq:S}
+S(0 \vert \hat{\beta}_0) \; .
+$$
+
+Let $a$ be a vector describing the direction of the regression line, let $b$ be the vector pointing from $S$ to $O$ and let $p$ be the vector pointing from $S$ to $P$.
+
+Because $\hat{\beta}_1$ is the slope of the regression line, we have
+
+$$ \label{eq:a}
+a = \left( \begin{matrix} 1 \\ \hat{\beta}_1 \end{matrix} \right) \; .
+$$
+
+Moreover, with the points $O$ and $S$, we have
+
+$$ \label{eq:b}
+b = \left( \begin{matrix} x_o \\ y_o \end{matrix} \right) - \left( \begin{matrix} 0 \\ \hat{\beta}_0 \end{matrix} \right) = \left( \begin{matrix} x_o \\ y_o - \hat{\beta}_0 \end{matrix} \right) \; .
+$$
+
+Because $P$ is located on the regression line, $p$ is collinear with $a$ and thus a scalar multiple of this vector:
+
+$$ \label{eq:p}
+p = w \cdot a \; .
+$$
+
+Moreover, as $P$ is the point on the regression line which is closest to $O$, this means that the vector $b-p$ is orthogonal to $a$, such that the inner product of these two vectors is equal to zero:
+
+$$ \label{eq:a-b-p-orth}
+a^\mathrm{T} (b-p) = 0 \; .
+$$
+
+Rearranging this equation gives
+
+$$ \label{eq:w}
+\begin{split}
+a^\mathrm{T} (b-p) &= 0 \\
+a^\mathrm{T} (b - w \cdot a) &= 0 \\
+a^\mathrm{T} b - w \cdot a^\mathrm{T} a &= 0 \\
+w \cdot a^\mathrm{T} a &= a^\mathrm{T} b \\
+w &= \frac{a^\mathrm{T} b}{a^\mathrm{T} a} \; .
+\end{split}
+$$
+
+With \eqref{eq:a} and \eqref{eq:b}, $w$ can be calculated as
+
+$$ \label{eq:w-qed}
+\begin{split}
+w &= \frac{a^\mathrm{T} b}{a^\mathrm{T} a} \\
+&= \frac{\left( \begin{matrix} 1 \\ \hat{\beta}_1 \end{matrix} \right)^\mathrm{T} \left( \begin{matrix} x_o \\ y_o - \hat{\beta}_0 \end{matrix} \right)}{\left( \begin{matrix} 1 \\ \hat{\beta}_1 \end{matrix} \right)^\mathrm{T} \left( \begin{matrix} 1 \\ \hat{\beta}_1 \end{matrix} \right)} \\
+&= \frac{x_0 + (y_o - \hat{\beta}_0) \hat{\beta}_1}{1 + \hat{\beta}_1^2}
+\end{split}
+$$
+
+Finally, with the point $S$ \eqref{eq:S} and the vector $p$ \eqref{eq:p}, the coordinates of $P$ are obtained as
+
+$$ \label{eq:P-qed}
+\left( \begin{matrix} x_p \\ y_p \end{matrix} \right) = \left( \begin{matrix} 0 \\ \hat{\beta}_0 \end{matrix} \right) + w \cdot \left( \begin{matrix} 1 \\ \hat{\beta}_1 \end{matrix} \right) = \left( \begin{matrix} w \\ \hat{\beta}_0 + \hat{\beta}_1 w \end{matrix} \right) \; .
+$$
+
+Together, \eqref{eq:P-qed} and \eqref{eq:w-qed} constitute the proof of \eqref{eq:slr-proj}.
diff --git a/P/slr-sss.md b/P/slr-sss.md