|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2021-11-16 08:34:00 |
| 9 | + |
| 10 | +title: "Maximum likelihood estimation for simple linear regression" |
| 11 | +chapter: "Statistical Models" |
| 12 | +section: "Univariate normal data" |
| 13 | +topic: "Simple linear regression" |
| 14 | +theorem: "Maximum likelihood estimation" |
| 15 | + |
| 16 | +sources: |
| 17 | + |
| 18 | +proof_id: "P287" |
| 19 | +shortcut: "slr-mle" |
| 20 | +username: "JoramSoch" |
| 21 | +--- |
| 22 | + |
| 23 | + |
| 24 | +**Theorem:** Given a [simple linear regression model](/D/mlr) with independent observations |
| 25 | + |
| 26 | +$$ \label{eq:slr} |
| 27 | +y = \beta_0 + \beta_1 x + \varepsilon, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2), \; i = 1,\ldots,n \; , |
| 28 | +$$ |
| 29 | + |
| 30 | +the [maximum likelihood estimates](/D/mle) of $\beta_0$, $\beta_1$ and $\sigma^2$ are given by |
| 31 | + |
| 32 | +$$ \label{eq:slr-mle} |
| 33 | +\begin{split} |
| 34 | +\hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \\ |
| 35 | +\hat{\beta}_1 &= \frac{s_{xy}}{s_x^2} \\ |
| 36 | +\hat{\sigma}^2 &= \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 |
| 37 | +\end{split} |
| 38 | +$$ |
| 39 | + |
| 40 | +where $\bar{x}$ and $\bar{y}$ are the [sample means](/D/mean-samp), $s_x^2$ is the [sample variance](/D/var-samp) of $x$ and $s_{xy}$ is the [sample covariance](/D/cov-samp) between $x$ and $y$. |
| 41 | + |
| 42 | + |
| 43 | +**Proof:** With the [probability density function of the normal distribution](/P/norm-pdf) and [probability under independence](/D/ind), the linear regression equation \eqref{eq:slr} implies the following [likelihood function](/D/lf) |
| 44 | + |
| 45 | +$$ \label{eq:slr-lf} |
| 46 | +\begin{split} |
| 47 | +p(y|\beta_0,\beta_1,\sigma^2) &= \prod_{i=1}^n p(y_i|\beta_0,\beta_1,\sigma^2) \\ |
| 48 | +&= \prod_{i=1}^n \mathcal{N}(y_i; \beta_0 + \beta_1 x_i, \sigma^2) \\ |
| 49 | +&= \prod_{i=1}^n \frac{1}{\sqrt{2 \pi \sigma}} \cdot \exp \left[ -\frac{(y_i - \beta_0 - \beta_1 x_i)^2}{2 \sigma^2} \right] \\ |
| 50 | +&= \frac{1}{\sqrt{(2 \pi \sigma^2)^n}} \cdot \exp\left[ -\frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \right] |
| 51 | +\end{split} |
| 52 | +$$ |
| 53 | + |
| 54 | +and the [log-likelihood function](/D/llf) |
| 55 | + |
| 56 | +$$ \label{eq:slr-ll} |
| 57 | +\begin{split} |
| 58 | +\mathrm{LL}(\beta_0,\beta_1,\sigma^2) &= \log p(y|\beta_0,\beta_1,\sigma^2) \\ |
| 59 | +&= -\frac{n}{2} \log(2\pi) - \frac{n}{2} \log (\sigma^2) -\frac{1}{2 \sigma^2} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \; . |
| 60 | +\end{split} |
| 61 | +$$ |
| 62 | + |
| 63 | +<br> |
| 64 | +The derivative of the log-likelihood function \eqref{eq:slr-ll} with respect to $\beta_0$ is |
| 65 | + |
| 66 | +$$ \label{eq:dLL-dbeta0} |
| 67 | +\frac{\mathrm{d}\mathrm{LL}(\beta_0,\beta_1,\sigma^2)}{\mathrm{d}\beta_0} = \frac{1}{\sigma^2} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i) |
| 68 | +$$ |
| 69 | + |
| 70 | +and setting this derivative to zero gives the MLE for $\beta_0$: |
| 71 | + |
| 72 | +$$ \label{eq:beta0-mle} |
| 73 | +\begin{split} |
| 74 | +\frac{\mathrm{d}\mathrm{LL}(\hat{\beta}_0,\hat{\beta}_1,\hat{\sigma}^2)}{\mathrm{d}\beta_0} &= 0 \\ |
| 75 | +0 &= \frac{1}{\hat{\sigma}^2} \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) \\ |
| 76 | +0 &= \sum_{i=1}^n y_i - n \hat{\beta}_0 - \hat{\beta}_1 \sum_{i=1}^n x_i \\ |
| 77 | +\hat{\beta}_0 &= \frac{1}{n} \sum_{i=1}^n y_i - \hat{\beta}_1 \frac{1}{n} \sum_{i=1}^n x_i \\ |
| 78 | +\hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \; . |
| 79 | +\end{split} |
| 80 | +$$ |
| 81 | + |
| 82 | +<br> |
| 83 | +The derivative of the log-likelihood function \eqref{eq:slr-ll} at $\hat{\beta}_0$ with respect to $\beta_1$ is |
| 84 | + |
| 85 | +$$ \label{eq:dLL-dbeta1} |
| 86 | +\frac{\mathrm{d}\mathrm{LL}(\hat{\beta}_0,\beta_1,\sigma^2)}{\mathrm{d}\beta_1} = \frac{1}{\sigma^2} \sum_{i=1}^n (x_i y_i - \hat{\beta}_0 x_i - \beta_1 x_i^2) \\ |
| 87 | +$$ |
| 88 | + |
| 89 | +and setting this derivative to zero gives the MLE for $\beta_1$: |
| 90 | + |
| 91 | +$$ \label{eq:beta1-mle} |
| 92 | +\begin{split} |
| 93 | +\frac{\mathrm{d}\mathrm{LL}(\hat{\beta}_0,\hat{\beta}_1,\hat{\sigma}^2)}{\mathrm{d}\beta_0} &= 0 \\ |
| 94 | +0 &= \frac{1}{\hat{\sigma}^2} \sum_{i=1}^n (x_i y_i - \hat{\beta}_0 x_i - \hat{\beta}_1 x_i^2) \\ |
| 95 | +0 &= \sum_{i=1}^n x_i y_i - \hat{\beta}_0 \sum_{i=1}^n x_i - \hat{\beta}_1 \sum_{i=1}^n x_i^2) \\ |
| 96 | +0 &\overset{\eqref{eq:beta0-mle}}{=} \sum_{i=1}^n x_i y_i - (\bar{y} - \hat{\beta}_1 \bar{x}) \sum_{i=1}^n x_i - \hat{\beta}_1 \sum_{i=1}^n x_i^2 \\ |
| 97 | +0 &= \sum_{i=1}^n x_i y_i - \bar{y} \sum_{i=1}^n x_i + \hat{\beta}_1 \bar{x} \sum_{i=1}^n x_i - \hat{\beta}_1 \sum_{i=1}^n x_i^2 \\ |
| 98 | +0 &= \sum_{i=1}^n x_i y_i - n \bar{x} \bar{y} + \hat{\beta}_1 n \bar{x}^2 - \hat{\beta}_1 \sum_{i=1}^n x_i^2 \\ |
| 99 | +\hat{\beta}_1 &= \frac{\sum_{i=1}^n x_i y_i - \sum_{i=1}^n \bar{x} \bar{y}}{\sum_{i=1}^n x_i^2 - \sum_{i=1}^n \bar{x}^2} \\ |
| 100 | +\hat{\beta}_1 &= \frac{\sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} \\ |
| 101 | +\hat{\beta}_1 &= \frac{s_{xy}}{s_x^2} \; . |
| 102 | +\end{split} |
| 103 | +$$ |
| 104 | + |
| 105 | +<br> |
| 106 | +The derivative of the log-likelihood function \eqref{eq:slr-ll} at $(\hat{\beta}_0,\hat{\beta}_1)$ with respect to $\sigma^2$ is |
| 107 | + |
| 108 | +$$ \label{eq:dLL-ds2} |
| 109 | +\frac{\mathrm{d}\mathrm{LL}(\hat{\beta}_0,\hat{\beta}_1,\sigma^2)}{\mathrm{d}\sigma^2} = - \frac{n}{2\sigma^2} + \frac{1}{2(\sigma^2)^2} \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 |
| 110 | +$$ |
| 111 | + |
| 112 | +and setting this derivative to zero gives the MLE for $\sigma^2$: |
| 113 | + |
| 114 | +$$ \label{eq:s2-mle} |
| 115 | +\begin{split} |
| 116 | +\frac{\mathrm{d}\mathrm{LL}(\hat{\beta}_0,\hat{\beta}_1,\hat{\sigma}^2)}{\mathrm{d}\sigma^2} &= 0 \\ |
| 117 | +0 &= - \frac{n}{2\hat{\sigma}^2} + \frac{1}{2(\hat{\sigma}^2)^2} \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 \\ |
| 118 | +\frac{n}{2\hat{\sigma}^2} &= \frac{1}{2(\hat{\sigma}^2)^2} \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 \\ |
| 119 | +\hat{\sigma}^2 &= \frac{1}{n} \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i)^2 \; . |
| 120 | +\end{split} |
| 121 | +$$ |
| 122 | + |
| 123 | +<br> |
| 124 | +Together, \eqref{eq:beta0-mle}, \eqref{eq:beta1-mle} and \eqref{eq:s2-mle} constitute the MLE for simple linear regression. |
0 commit comments