Code covered by the BSD License  

Highlights from
FOURPLOT

5.0

5.0 | 1 rating Rate this file 27 Downloads (last 30 days) File Size: 3.94 KB File ID: #42480
image thumbnail

FOURPLOT

by

 

03 Jul 2013 (Updated )

Four-plot for efficient visual exploratory data analysis, with box plot (V2.0, july 2013)

| Watch this File

File Information
Description

    FOURPLOT(X) creates for the values in X a "four-plot" that allows for a
    powerful and efficient visual inspection of the four underlying
    assumptions of univariate statistical analyses. X is a vector of
    observational values and should not contain NaNs or Infs.
  
    In four subplots, the run sequence plot (X[k] vs k), a lag plot (X[k]
    vs X[k-1]), a histogram, and a normal probablity plot are shown. Within
    these axes, the mean value of X is drawn as a straight line. In
    addition, a 5th panel shows a box-and-whisker plot of X.
 
    If the four underlying assumptions holds, the four plots will have a
    characteristic appearance.
    1. If the fixed location assumption holds, then the run sequence plot
       will be flat and non-drifting.
    2. If the fixed variation assumption holds, then the vertical spread in
       the run sequence plot will be the approximately the same over the
       entire horizontal axis.
    3. If the randomness assumption holds, then the lag plot will be
       structureless and random.
    4. If the fixed distribution assumption holds, in particular if the
       fixed normal distribution holds, then the histogram will be
       bell-shaped, and the normal probability plot will be linear.
 
    The box-and-whisker plot will show the median (red line), mean and SD (in
    blue), the 25th and 75th percentile (the box), and outliers (plus
    symbols), if any. The whiskers are the lowest value still within 1.5
    times the inter-quartile range (IQR) of the lower quartile, and the
    highest value still within 1.5 IQR of the upper quartile. Raw data are
    plotted in gray.
 
    STATS = FOURPLOT(X) also returns some statistical values in the
    structure STATS.
 
    Examples:
      % case 1: the four assumptions hold
        X = 20 + randn(100,1) * 10 ;
        fourplot(X) % nice, we can use classical statistics!
 
      % case 2: data is oscillating, which is not immediately clear
        unknown = cumsum(rand(1000,1)) ;
        unknown = unknown(randperm(numel(unknown))) ;
        X = sin(unknown) ; % X looks random (see, e.g., run sequence) ..
        fourplot(X) % .. but it is not!
 
    The usefulness of a four-plot extends beyond inspection of univariate
    and time series
    data. For instance, it can be used to inspect the residuals of model fit
    to determine whether the underlying error term of the model fullfills the
    assumptions, no matter how complicated the model may be.
 
    Example:
      x = 2*rand(100,1) ; y = exp(x) ; % the data
      par = polyfit(x,y,1) ; % a simple model
      res = y - polyval(par,x) ; % residuals
      fourplot(res) % -> our model is poor!
 
    More information can be found on the internet, e.g.,
    http://www.itl.nist.gov/div898/handbook/eda/section2/eda23.htm
 
    See also normplot (Statitiscs Toolbox)
             mean, median

MATLAB release MATLAB 7.14 (R2012a)
Other requirements Should work on most ML releases. The Statistics Toolbox is *NOT* required.
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (1)
03 Dec 2013 Edwin  
Updates
08 Jul 2013

version 2.0- added box plot

Contact us