Probability and Statistics

Capture-Recapture Problem

Let X be the number of recaptured subjects. The method of estimation for the population size N comes from

$\displaystyle E[X] = \frac{m\times r}{N}$

Assuming for a moment that $E[X]\approx X$ , we can find that the best estimate of population size will be $\lfloor (m\times r) / X\rfloor$ . Here $\lfloor a \rfloor$ is the floor of

, that is, the greatest integer less than or equal to

(for example, $\lfloor 3.8\rfloor = 3$ . However, it is possible that

in the estimate $(m\times r) / X$ . In order to avoid the division by zero, you may add a small positive value to

ee = 0.25
estimate = floor(m * r / (sample + ee))
hist(estimate, freq=F, col='red')
cat("\n [Method 1] sample mean =", mean(estimate))
cat("\n [Method 1] sample var =", var(estimate))

Another possibility is to remove all the zeros "

", and then evaluate the formula.

estimate.b = floor(m * r / sample[sample > 0])
hist(estimate.b, freq=F, col='red')
cat("\n [Method 2] sample mean =", mean(estimate.b))
cat("\n [Method 2] sample var =", var(estimate.b))

Either of the results above will indicate that the distribution of the estimate is highly skewed, and that the sample mean of estimate is biased toward larger values of estimate. A slight modification $\lfloor ((m+1)\times(r+1)) / (X+1)\rfloor$ of the above estimate will give the distribution which is less skewed.

estimate.c = floor((m + 1) * (r + 1) / (sample + 1))
hist(estimate.c, freq=F, col='red')
cat("\n [Method 3] sample mean =", mean(estimate.c))
cat("\n [Method 3] sample var =", var(estimate.c))

Sample R code. You can download capture.R, and run it.