How are the fitted line and the y-axis (probabilities) levels within probplot computed?

Question

0 votes

Exactly how are the fitted line and the reference levels on the y-axis computed for "probplot"?

Sign in to answer this question.

Follow Question

Answer 1

MathWorks Support Team on 16 May 2023

Edited: MathWorks Support Team on 16 May 2023

0 votes

This information has been added to the documentation as of MATLAB R2017b, as seen in the "algorithms" section of the "probplot" page.

https://in.mathworks.com/help/stats/probplot.html

For releases prior to MATLAB R2017b, here is a detailed explanation of how this is implemented within MATLAB:

1. Computation of the reference line:

Suppose random variable X has a lognormal distribution. Then by definition, log(X) has a normal distribution. As a result, there exist constants A and B > 0 such that log(Y) = A + B*log(X) has a normal distribution with mean mu = 0 and standard deviation sigma = 1. Equivalently, Y has a lognormal distribution with parameters mu = 0 and sigma = 1. Since B > 0,

Prob( X <= exp(t) ) = Prob( log(X) <= t ) = Prob( (log(Y) - A)/B <= t ) = Prob( log(Y) <= A + B*t ) = Prob( Y <= exp(A + B*t) ) = FY( exp(A + B*t) )

Where FY( . ) is the CDF of a lognormal distribution with parameters mu = 0 and sigma = 1.

Suppose we are given N observations of X, say x_1, x_2, ..., x_N. To estimate A and B, probplot uses the 25 and 75 percentiles of values x_i. Let q25 = 0.25 and q75 = 0.75. Let exp(t25) and exp(t75) be such that:

Prob( X <= exp(t25) ) = q25

Prob( X <= exp(t75) ) = q75

exp(t25) can be approximated by quantile([x_1,...,x_N],q25). Similar logic holds for approximating exp(t75). Therefore, we can approximate t25 and t75 like this:

t25 ~= log(quantile([x_1,...,x_N],q25)).

t75 ~= log(quantile([x_1,...,x_N],q75)).

Now,

q25 = Prob( X <= exp(t25) ) = FY( exp(A + B*t25) )

q75 = Prob( X <= exp(t75) ) = FY( exp(A + B*t75) )

Equivalently,

invFY(q25) = exp(A + B*t25)

invFY(q75) = exp(A + B*t75)

where invFY( . ) is the inverse CDF of a lognormal distribution with parameters mu = 0 and sigma = 1. Taking natural logs:

log(invFY(q25)) = A + B*t25

log(invFY(q75)) = A + B*t75

These two equations can now be used to solve for A and B. Notice that B > 0 as required.

2. Lines created in probplot and the y-axis labeling:

Suppose x_(1), x_(2),...,x_(N) are the sorted observations in increasing order. I assume no ties in the x_(i) to keep things simple. For very small delta, Prob( X <= x_(i) - delta ) ~= (i-1)/N and Prob( X <= x_(i) + delta ) ~= i/N. Hence, a sensible estimate of

Prob( X <= x_(i) ) ~= average of (i-1)/N and i/N = (i - 0.5)/N (the midpoint rule).

If x_i's do really come from a lognormal distribution then [using t = log(x_(i)) in the first equation]:

(i - 0.5)/N ~= Prob( X <= x_(i) ) = Prob( X <= exp(log(x_(i))) ) = FY( exp(A + B*log(x_(i))) )

or

log( invFY( (i - 0.5)/N ) ) ~= A + B*log(x_(i))

We would expect a plot of log(x_(i)) vs. log( invFY( (i - 0.5)/N ) ) to be a straight line. We plot two lines:

(a) A plot of log(x_(i)) vs. A + B*log(x_(i)) [the expected behavior if x_i's do really come from a lognormal distribution]

(b) A plot of log(x_(i)) vs. log( invFY( (i - 0.5)/N ) ) [the observed behavior]

Then we visually compare (a) and (b). We could plot (a) for a range of x values not necessarily restricted to x_(i). If you put a breakpoint in the probplot subfunction setprobticks, you can see that probplot changes the

labels

on the y-axis to be on the probability scale. The datatips are converted into probabilities in the probplotDatatipCallback listener.

If y is a value on the y-axis, then for some p, y = log( invFY( p ) ). Inverting FY(exp(y)) = p. The tick label for y is the

string

p.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

How are the fitted line and the y-axis (probabilities) levels within probplot computed?

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

How are the fitted line and the y-axis (probabilities) levels within probplot computed?

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments