Skip to content

Commit 1cb44a6

Browse files
authored
added 6 proofs
1 parent fa54c6b commit 1cb44a6

6 files changed

Lines changed: 789 additions & 0 deletions

File tree

P/anova1-fols.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-11-15 17:35:00
9+
10+
title: "F-statistic for main effect in terms of ordinary least squares estimates in one-way analysis of variance"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Analysis of variance"
14+
theorem: "F-statistic in terms of OLS estimates"
15+
16+
sources:
17+
18+
proof_id: "P377"
19+
shortcut: "anova1-fols"
20+
username: "JoramSoch"
21+
---
22+
23+
24+
**Theorem:** Given the [one-way analysis of variance](/D/anova1) assumption
25+
26+
$$ \label{eq:anova1}
27+
y_{ij} = \mu_i + \varepsilon_{ij}, \; \varepsilon_{ij} \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2),
28+
$$
29+
30+
1) the [F-statistic for the main effect](/P/anova1-f) can be expressed in terms of [ordinary least squares parameter estimates](/P/anova1-ols) as
31+
32+
$$ \label{eq:anova1-fols-v1}
33+
F = \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i (\hat{\mu}_i - \bar{y})^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \hat{\mu}_i)^2}
34+
$$
35+
36+
2) or, when using the [reparametrized version of one-way ANOVA](/P/anova1-repara), the F-statistic can be expressed as
37+
38+
$$ \label{eq:anova1-fols-v2}
39+
F = \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i \hat{\delta}_i^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \hat{\mu} - \hat{\delta}_i)^2} \; .
40+
$$
41+
42+
43+
**Theorem:** The [F-statistic for the main effect in one-way ANOVA](/P/anova1-f) is given in terms of the [sample means](/D/mean-samp) as
44+
45+
$$ \label{eq:anova1-f}
46+
F = \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i (\bar{y}_i - \bar{y})^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2}
47+
$$
48+
49+
where $\bar{y}_i$ is the average of all values $y_{ij}$ from category $i$ and $\bar{y}$ is the grand mean of all values $y_{ij}$ from all categories $i = 1, \ldots, k$.
50+
51+
1) The [ordinary least squares estimates for one-way ANOVA](/P/anova1-ols) are
52+
53+
$$ \label{eq:anova1-ols}
54+
\hat{\mu}_i = \bar{y}_i \; ,
55+
$$
56+
57+
such that
58+
59+
$$ \label{eq:anova1-fols-v1-qed}
60+
F = \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i (\hat{\mu}_i - \bar{y})^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \hat{\mu}_i)^2} \; .
61+
$$
62+
63+
2) The [OLS estimates for reparametrized one-way ANOVA](/P/anova1-repara) are
64+
65+
$$ \label{eq:anova1-repara-ols}
66+
\begin{split}
67+
\hat{\mu} &= \bar{y} \\
68+
\hat{\delta}_i &= \bar{y}_i - \bar{y} \; ,
69+
\end{split}
70+
$$
71+
72+
such that
73+
74+
$$ \label{eq:anova1-fols-v2-qed}
75+
F = \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i \hat{\delta}_i^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \hat{\mu} - \hat{\delta}_i)^2} \; .
76+
$$

P/anova1-pss.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-11-15 16:59:00
9+
10+
title: "Partition of sums of squares in one-way analysis of variance"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Analysis of variance"
14+
theorem: "Sums of squares in one-way ANOVA"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2022
19+
title: "Analysis of variance"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2022-11-15"
22+
url: "https://en.wikipedia.org/wiki/Analysis_of_variance#Partitioning_of_the_sum_of_squares"
23+
24+
proof_id: "P376"
25+
shortcut: "anova1-pss"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Theorem:** Given [one-way analysis of variance](/D/anova1),
31+
32+
$$ \label{eq:anova1}
33+
y_{ij} = \mu_i + \varepsilon_{ij}, \; \varepsilon_{ij} \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2)
34+
$$
35+
36+
sums of squares can be partitioned as follows
37+
38+
$$ \label{eq:anova1-pss}
39+
\mathrm{SS}_\mathrm{tot} = \mathrm{SS}_\mathrm{treat} + \mathrm{SS}_\mathrm{res}
40+
$$
41+
42+
where $\mathrm{SS}_\mathrm{tot}$ is the [total sum of squares](/D/tss), $\mathrm{SS}_\mathrm{treat}$ is the [treatment sum of squares](/D/trss) (equivalent to [explained sum of squares](/D/ess)) and $\mathrm{SS}_\mathrm{res}$ is the [residual sum of squares](/D/rss).
43+
44+
45+
**Proof:** The [total sum of squares](/D/tss) for [one-way ANOVA](/D/anova1) is given by
46+
47+
$$ \label{eq:anova1-tss}
48+
\mathrm{SS}_\mathrm{tot} = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y})^2
49+
$$
50+
51+
where $\bar{y}$ is the mean across all values $y_{ij}$. This can be rewritten as
52+
53+
$$ \label{eq:anova1-pss-s1}
54+
\begin{split}
55+
\sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y})^2 &= \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left[ (y_{ij} - \bar{y}_i) + (\bar{y}_i - \bar{y}) \right]^2 \\
56+
&= \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left[ (y_{ij} - \bar{y}_i)^2 + (\bar{y}_i - \bar{y})^2 + 2 (y_{ij} - \bar{y}_i) (\bar{y}_i - \bar{y}) \right] \\
57+
&= \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 + \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\bar{y}_i - \bar{y})^2 + 2 \sum_{i=1}^{k} (\bar{y}_i - \bar{y}) \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i) \; .
58+
\end{split}
59+
$$
60+
61+
Note that the following sum is zero
62+
63+
$$ \label{eq:anova1-pss-s2}
64+
\sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i) = \sum_{j=1}^{n_i} y_{ij} - n_i \cdot \bar{y}_i = \sum_{j=1}^{n_i} y_{ij} - n_i \cdot \frac{1}{n_i} \sum_{j=1}^{n_i} y_{ij} \; ,
65+
$$
66+
67+
so that the sum in \eqref{eq:anova1-pss-s1} reduces to
68+
69+
$$ \label{eq:anova1-pss-s3}
70+
\sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y})^2 = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\bar{y}_i - \bar{y})^2 + \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 \; .
71+
$$
72+
73+
With the [treatment sum of squares](/D/trss) for [one-way ANOVA](/D/anova1)
74+
75+
$$ \label{eq:anova1-trss}
76+
\mathrm{SS}_\mathrm{treat} = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\bar{y}_i - \bar{y})^2
77+
$$
78+
79+
and the [residual sum of squares](/D/rss) for [one-way ANOVA](/D/anova1)
80+
81+
$$ \label{eq:anova1-rss}
82+
\mathrm{SS}_\mathrm{res} = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 \; ,
83+
$$
84+
85+
we finally have:
86+
87+
$$ \label{eq:anova1-pss-qed}
88+
\mathrm{SS}_\mathrm{tot} = \mathrm{SS}_\mathrm{treat} + \mathrm{SS}_\mathrm{res} \; .
89+
$$

P/anova1-repara.md

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
---
2+
layout: proof
3+
mathjax: true
4+
5+
author: "Joram Soch"
6+
affiliation: "BCCN Berlin"
7+
e_mail: "joram.soch@bccn-berlin.de"
8+
date: 2022-11-15 16:22:00
9+
10+
title: "Reparametrization for one-way analysis of variance"
11+
chapter: "Statistical Models"
12+
section: "Univariate normal data"
13+
topic: "Analysis of variance"
14+
theorem: "Reparametrization for one-way ANOVA"
15+
16+
sources:
17+
- authors: "Wikipedia"
18+
year: 2022
19+
title: "Analysis of variance"
20+
in: "Wikipedia, the free encyclopedia"
21+
pages: "retrieved on 2022-11-15"
22+
url: "https://en.wikipedia.org/wiki/Analysis_of_variance#For_a_single_factor"
23+
24+
proof_id: "P375"
25+
shortcut: "anova1-repara"
26+
username: "JoramSoch"
27+
---
28+
29+
30+
**Theorem:** The [one-way analysis of variance](/D/anova1) model
31+
32+
$$ \label{eq:anova1}
33+
y_{ij} = \mu_i + \varepsilon_{ij}, \; \varepsilon_{ij} \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2)
34+
$$
35+
36+
can be rewritten using paraneters $\mu$ and $\delta_i$ instead of $\mu_i$
37+
38+
$$ \label{eq:anova1-repara}
39+
y_{ij} = \mu + \delta_i + \varepsilon_{ij}, \; \varepsilon_{ij} \overset{\mathrm{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2)
40+
$$
41+
42+
with the constraint
43+
44+
$$ \label{eq:anova1-constr}
45+
\sum_{i=1}^{k} \frac{n_i}{n} \delta_i = 0 \; ,
46+
$$
47+
48+
in which case
49+
50+
1) the model parameters are related to each other as
51+
52+
$$ \label{eq:anova1-repara-c1}
53+
\delta_i = \mu_i - \mu, \; i = 1, \ldots, k \; ;
54+
$$
55+
56+
2) the [ordinary least squares estimates](/P/anova1-ols) are given by
57+
58+
$$ \label{eq:anova1-repara-c2}
59+
\hat{\delta}_i = \bar{y}_i - \bar{y} = \frac{1}{n_i} \sum_{j=1}^{n_i} y_{ij} - \frac{1}{n} \sum_{i=1}^{k} \sum_{j=1}^{n_i} y_{ij} \; ;
60+
$$
61+
62+
3) the following [sum of squares](/P/anova1-pss) is [chi-square distributed](/D/chi2)
63+
64+
$$ \label{eq:anova1-repara-c3}
65+
\frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left( \hat{\delta}_i - \delta_i \right)^2 \sim \chi^2(k-1) \; ;
66+
$$
67+
68+
4) and the following [test statistic](/D/tstat) is [F-distributed](/D/f)
69+
70+
$$ \label{eq:anova1-repara-c4}
71+
F = \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i \hat{\delta}_i^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2} \sim \mathrm{F}(k-1, n-k)
72+
$$
73+
74+
under the [null hypothesis for the main effect](/D/anova1-f)
75+
76+
$$ \label{eq:anova1-repara-c4-h0}
77+
H_0: \; \delta_1 = \ldots = \delta_k = 0 \; .
78+
$$
79+
80+
81+
**Proof:**
82+
83+
1) Equating \eqref{eq:anova1} with \eqref{eq:anova1-repara}, we get:
84+
85+
$$ \label{eq:anova1-repara-c1-qed}
86+
\begin{split}
87+
y_{ij} = \mu + \delta_i + \varepsilon_{ij} &= \mu_i + \varepsilon_{ij} = y_{ij} \\
88+
\mu + \delta_i &= \mu_i \\
89+
\delta_i &= \mu_i - \mu \; .
90+
\end{split}
91+
$$
92+
93+
2) Equation \eqref{eq:anova1-repara} is a special case of the [two-way analysis of variance](/D/anova2) with (i) just one factor $A$ and (ii) no interaction term. Thus, OLS estimates are identical to [that of two-way ANOVA](/P/anova2-ols), i.e. given by
94+
95+
$$ \label{eq:anova1-repara-c2-qed}
96+
\begin{split}
97+
\hat{\mu} &= \bar{y}_{\bullet \bullet} \hphantom{\bar{y}_{i \bullet} - } = \frac{1}{n} \sum_{i=1}^{k} \sum_{j=1}^{n_i} y_{ij} \\
98+
\hat{\delta}_i &= \bar{y}_{i \bullet} - \bar{y}_{\bullet \bullet} = \frac{1}{n_i} \sum_{j=1}^{n_i} y_{ij} - \frac{1}{n} \sum_{i=1}^{k} \sum_{j=1}^{n_i} y_{ij} \; .
99+
\end{split}
100+
$$
101+
102+
3) Let $U_{ij} = (y_{ij} - \mu - \delta_i)/\sigma$, [such that](/P/norm-snorm) $U_{ij} \sim \mathcal{N}(0, 1)$ and consider the sum of all squared [random variables](/D/rvar) $U_{ij}$:
103+
104+
$$ \label{eq:anova1-repara-c3-s1}
105+
\begin{split}
106+
\sum_{i=1}^{k} \sum_{j=1}^{n_i} U_{ij}^2 &= \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left( \frac{y_{ij} - \mu - \delta_i}{\sigma} \right)^2 \\
107+
&= \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left[ (y_{ij} - \bar{y}_i) + ([\bar{y}_i - \bar{y}] - \delta_i) + (\bar{y} - \mu) \right]^2 \; .
108+
\end{split}
109+
$$
110+
111+
This square of sums, [using a number of intermediate steps, can be developed](/P/anova1-f) into a sum of squares:
112+
113+
$$ \label{eq:anova1-repara-c3-s2}
114+
\begin{split}
115+
\sum_{i=1}^{k} \sum_{j=1}^{n_i} U_{ij}^2 &= \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} \left[ (y_{ij} - \bar{y}_i)^2 + ([\bar{y}_i - \bar{y}] - \delta_i)^2 + (\bar{y} - \mu)^2 \right] \\
116+
&= \frac{1}{\sigma^2} \left[ \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 + \sum_{i=1}^{k} \sum_{j=1}^{n_i} ([\bar{y}_i - \bar{y}] - \delta_i)^2 + \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\bar{y} - \mu)^2 \right] \; .
117+
\end{split}
118+
$$
119+
120+
To this sum, [Cochran's theorem for one-way analysis of variance can be applied](/P/anova1-f), yielding the distributions:
121+
122+
$$ \label{eq:anova1-repara-c3-qed}
123+
\begin{split}
124+
\frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 &\sim \chi^2(n-k) \\
125+
\frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} ([\bar{y}_i - \bar{y}] - \delta_i)^2 \overset{\eqref{eq:anova1-repara-c2-qed}}{=} \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\hat{\delta}_i - \delta_i)^2 &\sim \chi^2(k-1) \; .
126+
\end{split}
127+
$$
128+
129+
4) The ratio of two [chi-square distributed](/D/chi2) [random variables](/D/rvar), divided by their [degrees of freedom](/D/dof), is [defined to be F-distributed](/D/f), so that
130+
131+
$$ \label{eq:anova1-repara-c4-s1}
132+
\begin{split}
133+
F &= \frac{\left( \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\hat{\delta}_i - \delta_i)^2 \right)/(k-1)}{\left( \frac{1}{\sigma^2} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2 \right)/(n-k)} \\
134+
&= \frac{\frac{1}{k-1} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (\hat{\delta}_i - \delta_i)^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2} \\
135+
&= \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i (\hat{\delta}_i - \delta_i)^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2} \\
136+
&= \frac{\frac{1}{k-1} \sum_{i=1}^{k} n_i \hat{\delta}_i^2}{\frac{1}{n-k} \sum_{i=1}^{k} \sum_{j=1}^{n_i} (y_{ij} - \bar{y}_i)^2}
137+
\end{split}
138+
$$
139+
140+
follows the F-distribution
141+
142+
$$ \label{eq:anova1-repara-c4-qed}
143+
F \sim \mathrm{F}(k-1, n-k)
144+
$$
145+
146+
under the null hypothesis \eqref{eq:anova1-repara-c4-h0}.

0 commit comments

Comments
 (0)