e-Statistics

Goodness of Fit

In the experiment on pea breeding Mendel's theory predicts the probabilities of occurrence associated with the types of progeny, say “round yellow”, “wrinkled yellow”, “round green”, and “wrinkled green.” Here we want to test whether the data from $ n$ observation is consistent with his theory—goodness of fit.

The model probabilities

$\displaystyle p_1,\ldots,p_k
$

are specified at (usually at the column of Probability or Percentage) for k categories or "groups." Out of the total size n each subject is classified into one of the k groups, and the expected number $E_i$ of subjects in i-th group is calculated from the model probabilities by

$\displaystyle E_i = n \times p_i,
\quad
i = 1,\ldots,k.
$

The observed numbers of subjects

$\displaystyle X_1, \ldots, X_k
$

are accounted for the total size $ n = X_1 + \cdots + X_k$ of data. Then goodness of fit to the model can be assessed by comparing the observed frequencies with the expected ones. The null hypothesis becomes “the model is valid,” and the discrepancy between the data and the model can be measured by the Pearson's chi-square statistic

$ \chi^2 = \displaystyle\sum_{i=1}^k \frac{(X_i - E_i)^2}{E_i} =$

Under the null hypothesis that the model probabilities are correct, the distribution of Pearson's chi-square $ \chi^2$ is approximated by chi-square distribution with (k-1) = degrees of freedom. Therefore, we can reject the null hypothesis if you observe that the test statistic $ \chi^2$ is larger than the critical point $\chi^2_{\alpha,df}$, casting doubt on the validity of the model. Or equivalently, by computing the $ p$-value

$p^* = P(X > \chi^2) =$

with a random variable X having the chi-square distribution with (k-1) degrees of freedom, we can find that the null hypothesis is rejected if $ p^* < \alpha$.


© TTU Mathematics