The binomial distribution models the total number of successes in repeated trials from an infinite population under the following conditions:

Only two outcomes are possible on each of

*n*trials.The probability of success for each trial is constant.

All trials are independent of each other.

The binomial distribution uses the following parameters.

Parameter | Description | Support |
---|---|---|

`N` | Number of trials | positive integer |

`p` | Probability of success | $$0\le p\le 1$$ |

The probability density function (pdf) is

$$f\left(x|N,p\right)=\left(\begin{array}{c}N\\ x\end{array}\right){p}^{x}{\left(1-p\right)}^{N-x}\text{\hspace{1em}};\text{\hspace{1em}}x=0,1,2,\mathrm{...},N\text{\hspace{0.17em}},$$

where *x* is the number of successes
in *n* trials of a Bernoulli process with probability
of success *p*.

The mean is

$$\text{mean}=np\text{\hspace{0.17em}}.$$

The variance is

$$\mathrm{var}=np\left(1-p\right)\text{\hspace{0.17em}}.$$

The binomial distribution is a generalization of the Bernoulli distribution,
allowing for a number of trials *n* greater than
1. The binomial distribution generalizes to the multinomial distribution when
there are more than two possible outcomes for each trial.

Suppose you are collecting data from a widget manufacturing
process, and you record the number of widgets within specification
in each batch of 100. You might be interested in
the probability that an individual widget is within specification.
Parameter estimation is the process of determining the parameter, *p*,
of the binomial distribution that fits this data best in some sense.

One popular criterion of goodness is to maximize the likelihood
function. The likelihood has the same form as the binomial pdf above.
But for the pdf, the parameters (*n* and *p*) are known constants and the variable
is *x*. The likelihood function
reverses the roles of the variables. Here, the sample values (the *x*'s)
are already observed. So they are the fixed constants. The variables
are the unknown parameters. MLE involves calculating the value of *p* that
give the highest likelihood given the particular set of data.

The function `binofit`

returns
the MLEs and confidence intervals for the parameters of the binomial
distribution. Here is an example using random numbers from the binomial
distribution with *n =* 100
and* p = *0.9.

rng default; % for reproducibility r = binornd(100,0.9) [phat, pci] = binofit(r,100)

r = 85 phat = 0.8500 pci = 0.7647 0.9135

The MLE for parameter *p* is 0.8800, compared to the true value of 0.9.
The 95% confidence interval for *p* goes
from 0.7998 to 0.9364, which includes the true value.
In this made-up example you know the "true value" of *p*. In experimentation you do not.

The following commands generate a plot of the binomial pdf for *n* =
10 and *p* = 1/2.

```
x = 0:10;
y = binopdf(x,10,0.5);
plot(x,y,'+')
```

Was this topic helpful?