Binomial Distribution

Overview

The binomial distribution models the total number of successes in repeated trials from an infinite population under the following conditions:

  • Only two outcomes are possible on each of n trials.

  • The probability of success for each trial is constant.

  • All trials are independent of each other.

Parameters

The binomial distribution uses the following parameters.

ParameterDescriptionSupport
NNumber of trialspositive integer
pProbability of success0p1

Probability Density Function

The probability density function (pdf) is

f(x|n,p)=(nx)px(1p)nx;x=0,1,2,...,n,

where x is the number of successes in n trials of a Bernoulli process with probability of success p.

Descriptive Statistics

The mean is

mean=np.

The variance is

var=np(1p).

Relationship to Other Distributions

The binomial distribution is a generalization of the Bernoulli distribution, allowing for a number of trials n greater than 1. The binomial distribution generalizes to the multinomial distribution when there are more than two possible outcomes for each trial.

Example

Suppose you are collecting data from a widget manufacturing process, and you record the number of widgets within specification in each batch of 100. You might be interested in the probability that an individual widget is within specification. Parameter estimation is the process of determining the parameter, p, of the binomial distribution that fits this data best in some sense.

One popular criterion of goodness is to maximize the likelihood function. The likelihood has the same form as the binomial pdf above. But for the pdf, the parameters (n and p) are known constants and the variable is x. The likelihood function reverses the roles of the variables. Here, the sample values (the x's) are already observed. So they are the fixed constants. The variables are the unknown parameters. MLE involves calculating the value of p that give the highest likelihood given the particular set of data.

The function binofit returns the MLEs and confidence intervals for the parameters of the binomial distribution. Here is an example using random numbers from the binomial distribution with n = 100 and p = 0.9.

rng default;  % for reproducibility
r = binornd(100,0.9)
[phat, pci] = binofit(r,100)
r =

    85


phat =

    0.8500


pci =

    0.7647    0.9135

The MLE for parameter p is 0.8800, compared to the true value of 0.9. The 95% confidence interval for p goes from 0.7998 to 0.9364, which includes the true value. In this made-up example you know the "true value" of p. In experimentation you do not.

The following commands generate a plot of the binomial pdf for n = 10 and p = 1/2.

x = 0:10;
y = binopdf(x,10,0.5);
plot(x,y,'+')

Was this topic helpful?