e-Statistics

Test for Independent Groups

Data are collected from two groups, say “Group 1” and “Group 2,” concerning with how "Group 1" and "Group 2" differ in terms of their respective population means $ \mu_1$ and $ \mu_2$.

Data analysis begins with summarizing data, and obtains the respective sample means $ \bar{X}$ and $ \bar{Y}$, and the sample standard deviations $ S_1$ and $ S_2$ from "Group 1" and "Group 2" with the respective sample sizes n and m.

Data may be arranged in two separate columns each of which contains data for the respective groups. The first column specifies "Group 1" and the second column "Group 2."

Data may be arranged in a form of one-way layout data. Here one column (as identified at "Summary statistics") contains the whole data, which is "grouped by" the column of categorical variable identifying "Group 1" and "Group 2."

Hypothesis test must be described by the alternative hypothesis

$ H_A:\hspace{0.05in}\mu_1$ $ \mu_2$
The test statistic $ T = \dfrac{\bar{X} - \bar{Y}}{S_{\bar{X}-\bar{Y}}}$ = is likely observed around zero under the null hypothesis $ H_0: \mu_1 = \mu_2$. The opposite of such an observation is made toward negatively extreme values (left tailed region), or toward positively extreme values (right tailed region), or either of the extremes (two-sided region; see t-distribution) if the alternative hypothesis $ H_A$ is respectively “ $ \mu_1 < \mu_2$,” or “ $ \mu_1 > \mu_2$,” or “ $ \mu_1 \neq \mu_2$.” The extreme observation is expressed by the p-value smaller than the significance level $ \alpha$, which suggests evidence to support the alternative hypothesis $ H_A$.

When it is reasonable to assume that the two population variances $ \sigma_1^2$ and $ \sigma_2^2$ of Group 1 and 2 are equal, the standard error (SE) is given by $S_{\bar{X}-\bar{Y}} = \sqrt{\frac{1}{n} + \frac{1}{m}} S_p$ via pooled sample variance $S_{p}^2 = \frac{(n-1)S_1^2 + (m-1)S_2^2}{n+m-2}$. In pooled t-test,

$d = \dfrac{\bar{X} - \bar{Y}}{S_{p}}$ =
is called the Cohen's d, providing the estimate for standardized mean difference.

A general procedure is applicable when we cannot assume that the variances are equal. Here the SE of $ (\bar{X}-\bar{Y})$ is given by $S_{\bar{X}-\bar{Y}} = \sqrt{\frac{S_1^2}{n} + \frac{S_2^2}{m}}$ with the respective sample variances $ S_1^2$ and $ S_2^2$.

Once the SE and the degree of freedom

df =

for t-distribution are obtained from the t-test above, we can construct the confidence interval for the population mean difference $ \mu_1 - \mu_2$.

$ \displaystyle
\left(\bar{X} - \bar{Y}
- t_{\alpha/2,df} S_{\bar{X}-\bar{Y}},\:
\bar{X} - \bar{Y}
+ t_{\alpha/2,df} S _{\bar{X}-\bar{Y}}
\right)$

= ( , )

Here the choices of confidence level $ (1-\alpha)$ are 90%, 95%, or 99%.


© TTU Mathematics