Probability plots


h = probplot(...)


probplot(Y) produces a normal probability plot comparing the distribution of the data Y to the normal distribution. Y can be a single vector, or a matrix with a separate sample in each column. The plot includes a reference line useful for judging whether the data follow a normal distribution.

probplot uses midpoint probability plotting positions. The ith sorted value from a sample of size N is plotted against the midpoint in the jump of the empirical CDF on the y axis. With uncensored data, that midpoint is (i–0.5)/N. With censored data (see below), the y value is more complicated to compute.

probplot(distribution,Y) creates a probability plot for the distribution specified by distribution. Acceptable strings for distribution are:

  • 'exponential' — Exponential probability plot (nonnegative values)

  • 'extreme value' — Extreme value probability plot (all values)

  • 'lognormal' — Lognormal probability plot (positive values)

  • 'normal' — Normal probability plot (all values)

  • 'rayleigh' — Rayleigh probability plot (positive values)

  • 'weibull' — Weibull probability plot (positive values)

The y axis scale is based on the selected distribution. The x axis has a log scale for the Weibull and lognormal distributions, and a linear scale for the others.

Not all distributions are appropriate for all data sets, and probplot will error when asked to create a plot with a data set that is inappropriate for a specified distribution. Appropriate data ranges for each distribution are given parenthetically in the list above.

probplot(Y,cens,freq) or probplot(distname,Y,cens,freq) requires a vector Y. cens is a vector of the same size as Y and contains 1 for observations that are right-censored and 0 for observations that are observed exactly. freq is a vector of the same size as Y, containing integer frequencies for the corresponding elements in Y.

probplot(ax,Y) takes a handle ax to an existing probability plot, and adds additional lines for the samples in Y. ax is a handle for a set of axes.

probplot(...,'noref') omits the reference line.

probplot(ax,PD) takes a probability distribution object, PD, and adds a fitted line to the axes specified by ax to represent the probability distribution specified by PD. PD is a ProbDist object of the ProbDistUnivParam class or ProbDistUnivKernel class.

probplot(ax,fun,params) takes a function fun and a set of parameters, params, and adds fitted lines to the axes of an existing probability plot specified by ax. fun is a function handle to a cdf function, specified with @ (for example, @wblcdf). params is the set of parameters required to evaluate fun, and is specified as a cell array or vector. The function must accept a vector of X values as its first argument, then the optional parameters, and must return a vector of cdf values evaluated at X.

h = probplot(...) returns handles to the plotted lines.


collapse all

Test Data for Weibull Distribution Using probplot

Generate sample data. The sample x1 contains 100 random numbers from a Weibull distribution with scale parameter A = 3 and shape parameter B = 3. The sample x2 contains 100 random numbers from a Rayleigh distribution with scale parameter B = 3.

rng('default');  % For reproducibility
x1 = wblrnd(3,3,100,1);
x2 = raylrnd(3,100,1);

Create a probability plot to assess whether the data in x1 and x2 comes from a Weibull distribution.

probplot('weibull',[x1 x2])
legend('Weibull Sample','Rayleigh Sample','Location','NW')

The probability plot shows that the data in x1 comes from a Weibull distribution, while the data in x2 does not.

Test Data for Normal Distribution Using probplot

Generate sample data containing about 20% outliers in the tails. The left tail of the sample data contains 10 values randomly generated from an exponential distribution with parameter mu = 1. The right tail contains 10 values randomly generated from an exponential distribution with parameter mu = 5. The center of the sample data contains 80 values randomly generated from a standard normal distribution.

rng default  % For reproducibility
left_tail = -exprnd(1,10,1);
right_tail = exprnd(5,10,1);
center = randn(80,1);
data = [left_tail;center;right_tail];

Create a probability plot to assess whether the data in data comes from a normal distribution. Plot a t location-scale curve on the same figure to compare with data.

p = mle(data,'dist','tlo');
t = @(data,mu,sig,df)cdf('tlocationscale',data,mu,sig,df);
h = probplot(gca,t,p);
h.Color = 'r';
h.LineStyle = '-';
title('{\bf Probability Plot}')

The plot shows that neither the normal line nor the t location-scale curve fit the tails very well because of the outliers.

See Also

| |

Was this topic helpful?