Description |
FOURPLOT(X) creates for the values in X a "four-plot" that allows for a
powerful and efficient visual inspection of the four underlying
assumptions of univariate statistical analyses. Descriptive statistics
are printed out in the command window.
X is a vector of observational values. It should be numerical and
cannot contain NaNs or Infs.
In four subplots, the run sequence plot (X[k] vs k), a lag plot (X[k]
vs X[k-1]), a histogram, and a normal probablity plot are shown. Within
these axes, the mean value of X is drawn as a straight line. In
addition, a 5th panel shows a box-and-whisker plot of X.
If the four underlying assumptions holds, the four plots will have a
characteristic appearance.
1. If the fixed location assumption holds, then the run sequence plot
will be flat and non-drifting.
2. If the fixed variation assumption holds, then the vertical spread in
the run sequence plot will be the approximately the same over the
entire horizontal axis.
3. If the randomness assumption holds, then the lag plot will be
structureless and random.
4. If the fixed distribution assumption holds, in particular if the
fixed normal distribution holds, then the histogram will be
bell-shaped, and the normal probability plot will be linear.
The box-and-whisker plot will show the median (red line), mean and SD (in
blue), the 25th and 75th percentile (the box), and outliers (plus
symbols), if any. The whiskers are the lowest value still within 1.5
times the inter-quartile range (IQR) of the lower quartile, and the
highest value still within 1.5 IQR of the upper quartile. Raw data are
plotted in gray.
STATS = FOURPLOT(X) also returns some statistical values in the
structure STATS. Descriptive statistics are not printed.
Examples:
% case 1: the four assumptions hold
X = 20 + randn(100,1) * 10 ;
fourplot(X) % nice, we can use classical statistics!
% case 2: data is oscillating, which is not immediately clear
unknown = cumsum(rand(1000,1)) ;
unknown = unknown(randperm(numel(unknown))) ;
X = sin(unknown) ; % X looks random (see, e.g., run sequence) ..
fourplot(X) % .. but it is not!
The usefulness of a four-plot extends beyond inspection of univariate
and time series
data. For instance, it can be used to inspect the residuals of model fit
to determine whether the underlying error term of the model fullfills the
assumptions, no matter how complicated the model may be.
Example:
x = 2*rand(100,1) ; y = exp(x) ; % the data
par = polyfit(x,y,1) ; % a simple model
res = y - polyval(par,x) ; % residuals
fourplot(res) % -> our model is poor!
More information can be found on the internet, e.g.,
http://www.itl.nist.gov/div898/handbook/eda/section2/eda23.htm
See also normplot (Statitiscs Toolbox)
mean, median |