Test for Homogeneity
In a study where there are two characteristics the researchers want to know whether these two characteristics, say “A” and “B,” are linked or independent. For such study we have paired observations in categorical data of size n, which is summarized in the contingency table.
The first column of contingency table should list categorical values (or levels)
for the characteristic “A”.
Then the rest of columns
correspond to categorical values (or responses) of the characteristic “B”,
and provide the cell frequencies 's.
The contingency table can be visualized by mosaic plot below.
The area of the tiles in the mosaic plot is proportional to the number
of observations
for the response of B within the level of A.
Thus, homogeneity can be indicated by the tiles of similar size across different levels of A.
The statement of null hypothesis becomes “the two characteristics are
independent.”
Let
and
denote the total counts of the respective value A and B
(i.e., the raw and column sum in the contingency table).
Under the null hypothesis,
the expected frequencies for the contingency table are given by
where n denotes the total cell counts.
Then the chi-square statistic is
=
Let and
denote the number of categorical values
in the row and the column, respectively.
Then we should compare the statistic with
chi-square distribution
with
degrees of freedom,
and construct the critical region
to determine whether the null hypothesis can be rejected or not.
Equivalently we can reject the null hypothesis
(that is, we can find dependence and evidence of association of the two characteristics)
if p-value
is significant (that is,
).
© TTU Mathematics