This dataset is the output from a simulation of the following thought experiment:
  1. Think of a whole number from one to six.
  2. Roll a standard, cubical die repeatedly until your chosen number appears three times.
  3. Record the number of rolls required.
  4. Repeat Steps 1-3 for a total of 1,000 data points.

The observed data points were all in the range [3, 57] (see histogram) although a few bins were empty.

Assuming that the die is fair, the 1,000 random variates will be distributed according to a Negative Binomial distribution.

Model

NegativeBinomial(A,B) = Bin ((y − 1), (B − 1)) A^B (1 − A)^(y − B)

where Bin (N, k) is the number of combinations of N things taken k at a time.

Parameters

In optimizing this model (using the maximum-likelihood criterion), parameter B is necessarily constant and was set equal to its known value (3). Parameter A was considered unknown. Note that the observed ML value for A (0.1691) is very close to its theoretical value (1/6). Were the sample much smaller, the value would likely have been off by more than this; if the sample were much larger, it would have been even closer.

The observed value for Chi-squared, for this example, is 65.235 which is quite acceptable [46th percentile]. Regress+ can also optimize the model by minimizing Chi-squared instead of maximizing the likelihood. With this new criterion, parameter A was found to be 0.1652 [Chi-squared = 65.088, 44th percentile].

The bootstrapped percentiles cited above are a function of the model, parameters, sample size, and optimization criterion.