|
| 1 | +--- |
| 2 | +layout: proof |
| 3 | +mathjax: true |
| 4 | + |
| 5 | +author: "Joram Soch" |
| 6 | +affiliation: "BCCN Berlin" |
| 7 | +e_mail: "joram.soch@bccn-berlin.de" |
| 8 | +date: 2022-11-15 16:22:00 |
| 9 | + |
| 10 | +title: "Reparametrization for one-way analysis of variance" |
| 11 | +chapter: "Statistical Models" |
| 12 | +section: "Univariate normal data" |
| 13 | +topic: "Analysis of variance" |
| 14 | +theorem: "Reparametrization for one-way ANOVA" |
| 15 | + |
| 16 | +sources: |
| 17 | + - authors: "Wikipedia" |
| 18 | + year: 2022 |
| 19 | + title: "Analysis of variance" |
| 20 | + in: "Wikipedia, the free encyclopedia" |
| 21 | + pages: "retrieved on 2022-11-15" |
| 22 | + url: "https://en.wikipedia.org/wiki/Analysis_of_variance#For_a_single_factor" |
| 23 | + |
| 24 | +proof_id: "P375" |
| 25 | +shortcut: "anova1-repara" |
| 26 | +username: "JoramSoch" |
| 27 | +--- |
| 28 | + |
| 29 | + |
| 30 | +**Theorem:** The [one-way analysis of variance](/D/anova1) model |
| 31 | + |
| 32 | +$$ \label{eq:anova1} |
| 33 | +y_{ij} = \mu_i + \varepsilon_{ij}, \; \varepsilon_{ij} \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2) |
| 34 | +$$ |
| 35 | + |
| 36 | +can be rewritten using paraneters $\mu$ and $\delta_i$ instead of $\mu_i$ |
| 37 | + |
| 38 | +$$ \label{eq:anova1-repara} |
| 39 | +y_{ij} = \mu + \delta_i + \varepsilon_{ij}, \; \varepsilon_{ij} \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2) |
| 40 | +$$ |
| 41 | + |
| 42 | +with the constraint |
| 43 | + |
| 44 | +$$ \label{eq:anova1-constr} |
| 45 | +\sum_{i=1}^{k} \frac{n_i}{n} \delta_i = 0 \; , |
| 46 | +$$ |
| 47 | + |
| 48 | +in which case |
| 49 | + |
| 50 | +1) the model parameters are related to each other as |
| 51 | + |
| 52 | +$$ \label{eq:anova1-repara-c1} |
| 53 | +\delta_i = \mu_i - \mu, \; i = 1, \ldots, k \; ; |
| 54 | +$$ |
| 55 | + |
| 56 | +2) the [ordinary least squares estimates](/P/anova1-ols) are given by |
| 57 | + |
| 58 | +$$ \label{eq:anova1-repara-c2} |
| 59 | +\hat{\delta}_i = \bar{y}_i - \bar{y} = \frac{1}{n_i} \sum_{j=1}^{n_i} y_{ij} - \frac{1}{n} \sum_{i=1}^{k} \sum_{j=1}^{n_i} y_{ij} \; ; |
| 60 | +$$ |
| 61 | + |
| 62 | +3) the following [sum of squares](/P/anova1-pss) is [chi-square distributed](/D/chi2) |
| 63 | + |
| 64 | +$$ \label{eq:anova1-repara-c3} |
| 65 | +\frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left( \hat{\delta}_i - \delta_i \right)^2 \sim \chi^2(k-1) \; ; |
| 66 | +$$ |
| 67 | + |
| 68 | +4) and the following [test statistic](/D/tstat) is [F-distributed](/D/f) |
| 69 | + |
| 70 | +$$ \label{eq:anova1-repara-c4} |
| 71 | +F = \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i \hat{\delta}_i^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2} \sim \mathrm{F}(k-1, n-k) |
| 72 | +$$ |
| 73 | + |
| 74 | +under the [null hypothesis for the main effect](/D/anova1-f) |
| 75 | + |
| 76 | +$$ \label{eq:anova1-repara-c4-h0} |
| 77 | +H_0: \; \delta_1 = \ldots = \delta_k = 0 \; . |
| 78 | +$$ |
| 79 | + |
| 80 | + |
| 81 | +**Proof:** |
| 82 | + |
| 83 | +1) Equating \eqref{eq:anova1} with \eqref{eq:anova1-repara}, we get: |
| 84 | + |
| 85 | +$$ \label{eq:anova1-repara-c1-qed} |
| 86 | +\begin{split} |
| 87 | +y_{ij} = \mu + \delta_i + \varepsilon_{ij} &= \mu_i + \varepsilon_{ij} = y_{ij} \\ |
| 88 | +\mu + \delta_i &= \mu_i \\ |
| 89 | +\delta_i &= \mu_i - \mu \; . |
| 90 | +\end{split} |
| 91 | +$$ |
| 92 | + |
| 93 | +2) Equation \eqref{eq:anova1-repara} is a special case of the [two-way analysis of variance](/D/anova2) with (i) just one factor $A$ and (ii) no interaction term. Thus, OLS estimates are identical to [that of two-way ANOVA](/P/anova2-ols), i.e. given by |
| 94 | + |
| 95 | +$$ \label{eq:anova1-repara-c2-qed} |
| 96 | +\begin{split} |
| 97 | +\hat{\mu} &= \bar{y}_{\bullet \bullet} \hphantom{\bar{y}_{i \bullet} - } = \frac{1}{n} \sum_{i=1}^{k} \sum_{j=1}^{n_i} y_{ij} \\ |
| 98 | +\hat{\delta}_i &= \bar{y}_{i \bullet} - \bar{y}_{\bullet \bullet} = \frac{1}{n_i} \sum_{j=1}^{n_i} y_{ij} - \frac{1}{n} \sum_{i=1}^{k} \sum_{j=1}^{n_i} y_{ij} \; . |
| 99 | +\end{split} |
| 100 | +$$ |
| 101 | + |
| 102 | +3) Let $U_{ij} = (y_{ij} - \mu - \delta_i)/\sigma$, [such that](/P/norm-snorm) $U_{ij} \sim \mathcal{N}(0, 1)$ and consider the sum of all squared [random variables](/D/rvar) $U_{ij}$: |
| 103 | + |
| 104 | +$$ \label{eq:anova1-repara-c3-s1} |
| 105 | +\begin{split} |
| 106 | +\sum_{i=1}^{k} \sum_{j=1}^{n_i} U_{ij}^2 &= \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left( \frac{y_{ij} - \mu - \delta_i}{\sigma} \right)^2 \\ |
| 107 | +&= \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left[ (y_{ij} - \bar{y}_i) + ([\bar{y}_i - \bar{y}] - \delta_i) + (\bar{y} - \mu) \right]^2 \; . |
| 108 | +\end{split} |
| 109 | +$$ |
| 110 | + |
| 111 | +This square of sums, [using a number of intermediate steps, can be developed](/P/anova1-f) into a sum of squares: |
| 112 | + |
| 113 | +$$ \label{eq:anova1-repara-c3-s2} |
| 114 | +\begin{split} |
| 115 | +\sum_{i=1}^{k} \sum_{j=1}^{n_i} U_{ij}^2 &= \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left[ (y_{ij} - \bar{y}_i)^2 + ([\bar{y}_i - \bar{y}] - \delta_i)^2 + (\bar{y} - \mu)^2 \right] \\ |
| 116 | +&= \frac{1}{\sigma^2} \left[ \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 + \sum_{i=1}^{k} \sum_{j=1}^{n_i} ([\bar{y}_i - \bar{y}] - \delta_i)^2 + \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\bar{y} - \mu)^2 \right] \; . |
| 117 | +\end{split} |
| 118 | +$$ |
| 119 | + |
| 120 | +To this sum, [Cochran's theorem for one-way analysis of variance can be applied](/P/anova1-f), yielding the distributions: |
| 121 | + |
| 122 | +$$ \label{eq:anova1-repara-c3-qed} |
| 123 | +\begin{split} |
| 124 | +\frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 &\sim \chi^2(n-k) \\ |
| 125 | +\frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} ([\bar{y}_i - \bar{y}] - \delta_i)^2 \overset{\eqref{eq:anova1-repara-c2-qed}}{=} \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\hat{\delta}_i - \delta_i)^2 &\sim \chi^2(k-1) \; . |
| 126 | +\end{split} |
| 127 | +$$ |
| 128 | + |
| 129 | +4) The ratio of two [chi-square distributed](/D/chi2) [random variables](/D/rvar), divided by their [degrees of freedom](/D/dof), is [defined to be F-distributed](/D/f), so that |
| 130 | + |
| 131 | +$$ \label{eq:anova1-repara-c4-s1} |
| 132 | +\begin{split} |
| 133 | +F &= \frac{\left( \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\hat{\delta}_i - \delta_i)^2 \right)/(k-1)}{\left( \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 \right)/(n-k)} \\ |
| 134 | +&= \frac{\frac{1}{k-1} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\hat{\delta}_i - \delta_i)^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2} \\ |
| 135 | +&= \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i (\hat{\delta}_i - \delta_i)^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2} \\ |
| 136 | +&= \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i \hat{\delta}_i^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2} |
| 137 | +\end{split} |
| 138 | +$$ |
| 139 | + |
| 140 | +follows the F-distribution |
| 141 | + |
| 142 | +$$ \label{eq:anova1-repara-c4-qed} |
| 143 | +F \sim \mathrm{F}(k-1, n-k) |
| 144 | +$$ |
| 145 | + |
| 146 | +under the null hypothesis \eqref{eq:anova1-repara-c4-h0}. |
0 commit comments