Geometric Distribution

Overview

The geometric distribution models the number of failures before one success in a series of independent trials, where each trial results in either success or failure, and the probability of success in any individual trial is constant. For example, if you toss a coin, the geometric distribution models the number of tails observed before getting a heads. The geometric distribution is discrete, existing only on the nonnegative integers.

Parameters

The geometric distribution uses the following parameter.

ParameterDescription
0p1Probability of success

Probability Distribution Function

Definition

The probability distribution function (pdf) of the geometric distribution is

y=f(x|p)=p(1p)x;x=0,1,2,,

where p is the probability of success, and x is the number of failures before the first success. The result y is the probability of observing exactly x trials before a success, when the probability of success in any given trial is p. For discrete distributions, the probability distribution function is also known as the probability mass function (pmf).

Plot

This plot shows how changing the value of the probability parameter p alters the shape of the pdf. Use geopdf to compute the pdf for values at x equals 1 through 10, for three different values of p. Then plot all three pdfs on the same figure for a visual comparison.

x = [1:10];
y1 = geopdf(x,0.1);   % For p = 0.1
y2 = geopdf(x,0.25);  % For p = 0.25
y3 = geopdf(x,0.75);  % For p = 0.75

figure;
plot(x,y1,'kd')
hold on
plot(x,y2,'ro')
plot(x,y3,'b+')
legend({'p = 0.1','p = 0.25','p = 0.75'})
hold off

In this plot, the value of y is the probability of observing exactly x trials before a success. When the probability of success p is large, y decreases rapidly as x increases, and the probability of observing a large number of failures before a success quickly becomes small. But when the probability of success p is small, y decreases slowly as x increases. The probability of observing a large number of failures before a success still decreases as the number of trials increases, but at a much slower rate.

Random Number Generation

A random number generated from a geometric distribution represents the number of failures observed before a success in a single experiment, given the probability of success p for each independent trial. Use geornd to generate random numbers from the geometric distribution. For example, the following generates a random number from a geometric distribution with probability of success p equal to 0.1.

p = 0.1;
r = geornd(p)
r =

     1

The returned random number represents the number of failures observed before a success in a series of independent trials.

Relationship to Other Distributions

The geometric distribution is a special case of the negative binomial distribution, with the specified number of successes parameter r equal to 1.

Cumulative Distribution Function

Definition

The cumulative distribution function (cdf) of the geometric distribution is

y=F(x|p)=1(1p)x+1;x=0,1,2,...,

where p is the probability of success, and x is the number of failures before the first success. The result y is the probability of observing up to x trials before a success, when the probability of success in any given trial is p.

Plot

This plot shows how changing the value of the parameter p alters the shape of the cdf. Use geocdf to compute the cdf values at x equals 1 through 10, for three different values of p. Then plot all three cdfs on the same figure for a visual comparison.

x = [1:10];
y1 = geocdf(x,0.1);   % For p = 0.1
y2 = geocdf(x,0.25);  % For p = 0.25
y3 = geocdf(x,0.75);  % For p = 0.75

figure;
plot(x,y1,'kd')
hold on
plot(x,y2,'ro')
plot(x,y3,'b+')
legend({'p = 0.1','p = 0.25','p = 0.75'})
hold off

In this plot, the value of y is the probability of observing up to x trials before a success. When the probability of success p is large, y increases rapidly as x increases. The probability of observing a success quickly becomes very high, even for a small number of trials. But when the probability of success p is small, y increases slowly as x increases. The probability of observing a success still increases as the number of trials increases, but at a much slower rate.

Inverse cdf

The inverse cdf of a geometric distribution determines the value of x that corresponds to a probability y of observing x successes in a row in independent trials. Use geoinv to compute the inverse cdf of the geometric distribution. For example, the following returns the smallest possible integer x such that the geometric cdf y evaluated at x is greater than or equal to 0.1 , when the probability of success for each independent trial p is 0.03.

y = 0.1;
p = 0.03;
x = geoinv(y,p)
x =

     3

Mean and Variance

The mean of the geometric distribution is

mean=1pp,

and the variance of the geometric distribution is

var=1pp2,

where p is the probability of success.

Use geostat to compute the mean and variance of a geometric distribution. For example, the following computes the mean m and variance v of a geometric distribution with probability parameter p equal to 0.25.

p = 0.25;
[m,v] = geostat(p)
m =

     3


v =

    12

Example

Compute Geometric Distribution Probabilities

Suppose the probability of a five-year-old car battery not starting in cold weather is 0.03. What is the probability of the car starting for 25 consecutive days during a long cold snap?

Model the scenario using a geometric distribution. In this case, the "failure" event is the car starting, and the "success" event is the car not starting. We want to determine the probability of observing 25 failures (the car starting) without observing a single success (the car not starting). The probability of success for each trial (the car not starting in any single attempt) is P = 0.03.

To solve, first compute the cumulative distribution function (cdf) for x = 25 trials. This returns the probability of observing success (the car not starting) in up to 25 trials. Then subtract this result from 1 to determine the probability of $\textit{not}$ observing success in up to 25 trials - in other words, the probability that the car starts at every one of the 25 attempts.

pstart = 1 - geocdf(25,0.03)
pstart =

    0.4530

The returned result pstart = 0.4530 is the probability that the car will start every day for 25 days in a row during a cold snap.

This plot of the cdf for this scenario shows that, as the number of trials (x) increases, the probability of success (y) also increases. In the context of this example, it means that the more times you attempt to start the car, the greater the probability that it does not start on at least one of those occasions.

figure;
x = 0:25;
y = geocdf(x,0.03);
stairs(x,y)

Was this topic helpful?