e-Statistics

Analysis of Variance

A model assumes the group mean $ \alpha_i$ for each level $ i = 1,\ldots,k$. Here $\alpha_1$ often represents a baseline (for which the mean of control group is considered), and the contrast $\alpha_i-\alpha_1$ is known as the i-th factor effect.

The data can be arranged in a form of grouped data which is "grouped by" the column of categorical variable indicating factor levels. Statistical inference begins with calculation of the sample mean $ \displaystyle
\bar{X}_{i\cdot} = \frac{1}{n_i} \sum_{j=1}^{n_i} X_{ij}$ within group for every factor level $ i = 1,\ldots,k$, which is the point estimate of $ \alpha_i$. It is also useful to obtain the sample standard deviation within factor level, that is, the square root of $ \displaystyle\frac{1}{n_i-1} \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_{i\cdot})^2$.

The overall sample mean

$\displaystyle \bar{X}_{\cdot\cdot}
= \frac{1}{n} \sum_{i=1}^k \: \sum_{j=1}^{n_i} X_{ij}
= \frac{1}{n} \sum_{i=1}^k n_i \bar{X}_{i\cdot}
$ =
is obtained for the total sample of size $ n = n_1 + \cdots + n_k$. We then proceed to compute the analysis of variance table (AOV table) which summarizes the degree of freedom (df), the sum of squares (SS), and mean squares (MS).

AOV model. We assume (i) the same variance $ \sigma^2$ for different groups, and (ii) the independent normal random variable

$\displaystyle X_{ij} = \alpha_i + \epsilon_{ij}
$

with $ \epsilon_{ij}\sim N(0, \sigma^2)$ for each level $ i = 1,\ldots,k$ and each individual $ j = 1,\ldots,n_i$. Then the mean squares $ MS_{\mbox{error}}$ within groups represents the mean square error (MSE), and becomes the estimate of $ \sigma^2$.

  1. $ \displaystyle
SS_{\mbox{group}} = \sum_{i=1}^k
n_i (\bar{X}_{i\cdot} - \bar{X}_{\cdot\cdot})^2$ is the sum of squares between groups, having $ df_{\mbox{group}} = k - 1$ degrees of freedom. Thus, the mean squares is given by

    $\displaystyle MS_{\mbox{group}} = \displaystyle\frac{SS_{\mbox{group}}}{k-1}
$

  2. $ \displaystyle
SS_{\mbox{error}} = \sum_{i=1}^k \: \sum_{j=1}^{n_i}
(X_{ij} - \bar{X}_{i\cdot})^2$ is the sum of squares within groups, having $ df_{\mbox{error}} = n - k$ degrees of freedom. Thus, the mean squares is given by

    $\displaystyle MS_{\mbox{error}} = \displaystyle\frac{SS_{\mbox{error}}}{n-k}
$

  3. $ \displaystyle
SS_{\mbox{total}} = \sum_{i=1}^k \: \sum_{j=1}^{n_i}
(X_{ij} - \bar{X}_{\cdot\cdot})^2$ is the total sum of squares, having $ df_{\mbox{total}} = n - 1$ degrees of freedom. It can be decomposed into

    $\displaystyle SS_{\mbox{total}} = SS_{\mbox{group}} + SS_{\mbox{error}}
$

Hypothesis test. Hypothesis test to detect “some effects” of factor level becomes

$\displaystyle H_0:\: \alpha_1 = \cdots = \alpha_k$    versus $\displaystyle \quad
H_A:\: \alpha_i \neq \alpha_j$    for some pair $ \{i,j\}$.

Under the null hypothesis $ H_0$ the test statistic

$ F = \frac{MS_{\mbox{group}}}{MS_{\mbox{error}}}$ =

has an F-distribution with

$ (k-1, n-k)$ = ( , )

degree of freedom. By $F_{\alpha,k-1,n-k}$ we denote the critical point satisfying $P(X > F_{\alpha,k-1,n-k}) = \alpha$ where X is the F-distributed random variable. In the hypothesis test we reject $ H_0$ with significance level $ \alpha$ when the observed value F = x satisfies x > $F_{\alpha,k-1,n-k}$. Or, equivalently we can compute the p-value $ p^* = P(X > x)$ and reject $ H_0$ when $ p^* < \alpha$.


© TTU Mathematics