Documentation

probplot

Probability plots

Syntax

  • probplot(y,cens)
  • probplot(y,cens,freq)
    example
  • probplot(dist,___)
  • probplot(ax,y)
  • probplot(ax,pd)
  • probplot(ax,fun,params)
    example
  • probplot(___,'noref')

Description

example

probplot(y) creates a normal probability plot comparing the distribution of the data in y to the normal distribution. The plot includes a reference line useful for judging whether the data follows a normal distribution.

example

probplot(dist,y) creates a probability plot for the distribution specified by dist, using the sample data in y.

probplot(y,cens) creates a probability plot using the censoring data in cens.

example

probplot(y,cens,freq) creates a probability plot using the censoring data in cens and the frequency data in freq.

probplot(dist,___) creates a probability plot for the distribution specified by dist, using any of the previous syntaxes.

probplot(ax,y) plots additional lines for the sample data in y to the probability plot specified by the axis handle ax.

probplot(ax,pd) plots a fitted line on the axes specified by ax to represent the probability distribution specified by pd.

example

probplot(ax,fun,params) plots a fitted line on the axes specified by ax to represent the function specified by fun with the parameters specified by params.

probplot(___,'noref') omits the reference line from the plot.

example

h = probplot(___) returns graphics handles corresponding to the plotted lines.

Examples

collapse all

Test Data for Weibull Distribution Using probplot

Generate sample data. The sample x1 contains 100 random numbers from a Weibull distribution with scale parameter A = 3 and shape parameter B = 3. The sample x2 contains 100 random numbers from a Rayleigh distribution with scale parameter B = 3.

rng('default');  % For reproducibility
x1 = wblrnd(3,3,100,1);
x2 = raylrnd(3,100,1);

Create a probability plot to assess whether the data in x1 and x2 comes from a Weibull distribution.

figure;
probplot('weibull',[x1 x2])
legend('Weibull Sample','Rayleigh Sample','Location','NW')

The probability plot shows that the data in x1 comes from a Weibull distribution, while the data in x2 does not.

Test Data for Normal Distribution Using probplot

Generate sample data containing about 20% outliers in the tails. The left tail of the sample data contains 10 values randomly generated from an exponential distribution with parameter mu = 1. The right tail contains 10 values randomly generated from an exponential distribution with parameter mu = 5. The center of the sample data contains 80 values randomly generated from a standard normal distribution.

rng default  % For reproducibility
left_tail = -exprnd(1,10,1);
right_tail = exprnd(5,10,1);
center = randn(80,1);
data = [left_tail;center;right_tail];

Create a probability plot to assess whether the sample data comes from a normal distribution. Plot a t location-scale curve on the same figure to compare with data.

figure;
probplot(data);
p = mle(data,'dist','tlo');
t = @(data,mu,sig,df)cdf('tlocationscale',data,mu,sig,df);
h = probplot(gca,t,p);
h.Color = 'r';
h.LineStyle = '-';
title('{\bf Probability Plot}')
legend('Data','Normal','t','Location','NW')

The plot shows that neither the normal line nor the t location-scale curve fits the tails very well because of the outliers.

Identify Significant Effects with Half-Normal Probability Plot

Create a half-normal probability distribution plot to identify significant effects in an experiment to study factors that might influence flow rate in a chemical manufacturing process. The four factors are reactants A, B, C, and D. Each factor is present at two levels (high and low concentration). The experiment contains only one replication at each factor level.

Load the sample data.

load flowrate

The first four columns of the table flowrate contain the design matrix for the factors and their interations. The design matrix is coded to use 1 for the high factor level and -1 for the low factor level. The fifth column of flowrate contains the measured flow rate.

Fit a linear regression model using rate as the response variable. Use predictor variables A, B, C, D, and all of their interation terms.

mdl = fitlm(flowrate,'rate ~ A*B*C*D');

Calculate and store the absolute value of the factor effect estimates. To obtain the factor effect estimates, multiply the coefficient estimates obtained during the model fitting by two. This step is necessary because the regression coefficients measure the effect of a one-unit change in x on the mean of y. However, the effects estimates measure a two-unit change in x due to the design matrix coding of -1 and 1. Exclude the baseline measurement. Note that the factor order in mdl may be different from the order in the original design matrix.

effects = abs(mdl.Coefficients{2:end,1}*2);

Create a half-normal probability plot using the absolute value of the effects estimates, excluding the baseline.

figure
h = probplot('halfnormal',effects);

Label the points and format the plot. First, return the index values for the sorted effects estimates (from lowest to highest). Then use these index values to sort the probability values stored in the graphics handle (h(1).YData).

[b,i] = sort(effects);
prob(i) = h(1).YData;

Add text labels to the plot at each point. For each point, the x-value is the effects estimate and the y-value is the corresponding probability.

text(effects,prob,mdl.CoefficientNames(2:end),'FontSize',8,...
    'VerticalAlignment','top')
h(1).Color = 'r';

The points located far from the reference line represent the significant effects.

Create a Normal Probability Plot Using Frequency Data

Generate simulated frequency data.

y = 1:10;
freq = [2 4 6 7 9 8 7 7 6 5];

Create a normal probability plot using the frequency data.

probplot(y,[],freq)

The normal probability plot shows that the data do not have a normal distribution.

Input Arguments

collapse all

y — Sample datanumeric vector | numeric matrix

Sample data, specified as a numeric vector or numeric matrix. probplot displays each value in y using marker symbols including 'x' and 'o'. If y is a matrix, then probplot displays a separate line for each column of y.

Not all distributions are appropriate for all data sets. probplot errors if the data set is inappropriate for a specified distribution. See dist for appropriate data ranges for each distribution.

dist — Distribution for probability plot'normal' (default) | 'exponential' | 'extreme value' | 'half normal' | 'lognormal' | 'rayleigh' | 'weibull'

Distribution for probability plot, specified as one of the following distribution name strings:

Name StringPlot TypeData Range
'normal'Normal probability plotAll values
'exponential'Exponential probability plotNonnegative values
'extreme value'Extreme value probability plotAll values
'half normal'Half-normal probability plotNonnegative values
'lognormal'Lognormal probability plotPositive values
'rayleigh'Rayleigh probability plotPositive values
'weibull'Weibull probability plotPositive values

The y-axis scale is based on the selected distribution. The x-axis has a log scale for the Weibull and lognormal distributions, and a linear scale for the others.

Not all distributions are appropriate for all data sets. probplot errors if the data set is inappropriate for a specified distribution.

Example: 'weibull'

cens — Censoring datanumeric vector

Censoring data, specified as a numeric vector. cens must be the same length as y, and contain a 1 value for observations that are right-censored and a 0 value for observations that are measured exactly.

Data Types: single | double

freq — Frequency datavector of integer values

Frequency data, specified as a vector of integer values. freq must be the same length as y. freq contains the integer frequencies for the corresponding elements in y.

To create a probability plot using frequency data but not censoring data, specify empty brackets ([]) for cens.

Data Types: single | double

ax — Axis handleaxis handle object

Axis handle, specified as an axis handle object. probplot adds additional lines for the data in y to the plot corresponding to the axis specified by ax. Determine the axis handle for the current plot using gca.

pd — Probability distribution for reference lineprobability distribution object

Probability distribution for reference line, specified as a probability distribution object. probplot adds a fitted line to the axes specified by ax to represent the probability distribution specified by pd.

Create a probability distribution object with specified parameter values using makedist. Alternatively, fit a probability distribution object to sample data using fitdist. For more information on probability distribution objects, see Working with Probability Distributions.

fun — Function for reference linefunction handle

Function for reference line, specified as a function handle. probplot adds a fitted line to the axes specified by ax to represent the function specified by fun, evaluated at the parameters specified by params.

fun is a function handle to a cdf function, specified using the function handle operator @. The function must accept a vector of input values as its first argument, and return a vector containing the cdf evaluated at each input value. Specify the parameter values required to evaluate fun using the params argument. For more information on function handles, see Create Function Handle.

Example: @wblpdf

Data Types: function_handle

params — Reference line function parametersvector of numeric values | cell array

Reference line function parameters, specified as a vector of numeric values or a cell array. probplot adds a fitted line to the axes specified by ax to represent the function specified by fun, evaluated at the parameters specified by params.

fun is a function handle to a cdf function, specified using the function handle operator @. The function must accept a vector of values as its first argument, and return a vector of cdf values evaluated at each value. Specify the parameter values required to evaluate fun using the params argument. For more information on function handles, see Create Function Handle.

Output Arguments

collapse all

h — Graphic handles for line objectsvector of Line graphic handles

Graphic handles for line objects, returned as a vector of Line graphic handles. Graphic handles are unique identifiers that you can use to query and modify the properties of a specific line on the plot. For each column of y, probplot returns two handles:

  • The line representing the data points. probplot represents each data point in y using scatterplot symbols such as '+' and 'o'.

  • The line showing the theoretical distribution for the probability plot, represented as a dashed line.

To view and set properties of line objects, use dot notation. For information on using dot notation, see Access Property Values. For information on the Line properties that you can set, see Primitive Line Properties.

See Also

| |

Introduced before R2006a

Was this topic helpful?