Documentation |
On this page… |
---|
When the r parameter is an integer, the negative binomial pdf is
$$y=f(x|r,p)=\left(\begin{array}{c}r+x-1\\ x\end{array}\right){p}^{r}{q}^{x}{I}_{(0,1,\mathrm{...})}(x)$$
where q = 1 – p. When r is not an integer, the binomial coefficient in the definition of the pdf is replaced by the equivalent expression
$$\frac{\Gamma (r+x)}{\Gamma (r)\Gamma (x+1)}$$
In its simplest form (when r is an integer), the negative binomial distribution models the number of failures x before a specified number of successes is reached in a series of independent, identical trials. Its parameters are the probability of success in a single trial, p, and the number of successes, r. A special case of the negative binomial distribution, when r = 1, is the geometric distribution, which models the number of failures before the first success.
More generally, r can take on non-integer values. This form of the negative binomial distribution has no interpretation in terms of repeated trials, but, like the Poisson distribution, it is useful in modeling count data. The negative binomial distribution is more general than the Poisson distribution because it has a variance that is greater than its mean, making it suitable for count data that do not meet the assumptions of the Poisson distribution. In the limit, as r increases to infinity, the negative binomial distribution approaches the Poisson distribution.
Suppose you are collecting data on the number of auto accidents on a busy highway, and would like to be able to model the number of accidents per day. Because these are count data, and because there are a very large number of cars and a small probability of an accident for any specific car, you might think to use the Poisson distribution. However, the probability of having an accident is likely to vary from day to day as the weather and amount of traffic change, and so the assumptions needed for the Poisson distribution are not met. In particular, the variance of this type of count data sometimes exceeds the mean by a large amount. The data below exhibit this effect: most days have few or no accidents, and a few days have a large number.
accident = [2 3 4 2 3 1 12 8 14 31 23 1 10 7 0]; m = mean(accident) v = var(accident)
m = 8.0667 v = 79.3524
The negative binomial distribution is more general than the Poisson, and is often suitable for count data when the Poisson is not. The function nbinfit returns the maximum likelihood estimates (MLEs) and confidence intervals for the parameters of the negative binomial distribution. Here are the results from fitting the accident data:
[phat,pci] = nbinfit(accident)
phat = 1.0060 0.1109 pci = 0.2152 0.0171 1.7968 0.2046
It is difficult to give a physical interpretation in this case to the individual parameters. However, the estimated parameters can be used in a model for the number of daily accidents. For example, a plot of the estimated cumulative probability function shows that while there is an estimated 10% chance of no accidents on a given day, there is also about a 10% chance that there will be 20 or more accidents.
plot(0:50,nbincdf(0:50,phat(1),phat(2)),'.-'); xlabel('Accidents per Day') ylabel('Cumulative Probability')
Compute and plot the pdf using four different values for the parameter r, the desired number of successes: .1, 1, 3, and 6. In each case, the probability of success p is .5.
x = 0:10; plot(x,nbinpdf(x,.1,.5),'s-', ... x,nbinpdf(x,1,.5),'o-', ... x,nbinpdf(x,3,.5),'d-', ... x,nbinpdf(x,6,.5),'^-'); legend({'r = .1' 'r = 1' 'r = 3' 'r = 6'}) xlabel('x') ylabel('f(x|r,p)')
The plot shows that the negative binomial distribution can take on a variety of shapes, ranging from very skewed to nearly symmetric, depending on the value of r.