boxplot - Box plot

Syntax

boxplot(X)
boxplot(X,G)
boxplot(axes,X,...)
boxplot(...,param1,val1,param2,val2,...)

Description

boxplot(X) produces a box plot of the data in X. If X is a matrix, there is one box per column; if X is a vector, there is just one box. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually.

boxplot(X,G) specifies one or more grouping variables G, producing a separate box for each set of X values sharing the same G value or values (see Grouped Data). Grouping variables must have one row per element of X, or one row per column of X. Specify a single grouping variable in G using a vector, a character array, a cell array of strings, or a vector categorical array; specify multiple grouping variables in G using a cell array of these variable types, such as {G1 G2 G3}, or by using a matrix. If multiple grouping variables are used, they must all be the same length. Groups that contain a NaN value or an empty string in a grouping variable are omitted, and are not counted in the number of groups considered by other parameters.

By default, character and string grouping variables are sorted in the order they initially appear in the data, categorical grouping variables are sorted by the order of their levels, and numeric grouping variables are sorted in numeric order. To control the order of groups, do one of the following:

boxplot(axes,X,...) creates the plot in the axes with handle axes.

boxplot(...,param1,val1,param2,val2,...) specifies optional parameter name/value pairs, as described in the following table.

ParameterValues
'plotstyle'
  • 'traditional' — Traditional box style. This is the default.

  • 'compact' — Box style designed for plots with many groups. This style changes the defaults for some other parameters, as described in the following table.

'boxstyle'
  • 'outline' — Draws an unfilled box with dashed whiskers. This is the default.

  • 'filled' — Draws a narrow filled box with lines for whiskers.

'colorgroup'

One or more grouping variables, of the same type as permitted for G, specifying that the box color should change when the specified variables change. The default is [] for no box color change.

'colors'

Colors for boxes, specified as a single color (such as 'r' or [1 0 0]) or multiple colors (such as 'rgbm' or a three-column matrix of RGB values). The sequence is replicated or truncated as required, so for example 'rb' gives boxes that alternate in color. The default when no 'colorgroup' is specified is to use the same color scheme for all boxes. The default when 'colorgroup' is specified is a modified hsv colormap.

'datalim'

A two-element vector containing lower and upper limits, used by 'extrememode' to determine which points are extreme. The default is [-Inf Inf].

'extrememode'
  • 'clip' — Moves data outside the 'datalim' limits to the limit. This is the default.

  • 'compress' — Evenly distributes data outside the 'datalim' limits in a region just outside the limit, retaining the relative order of the points.

A dotted line marks the limit if any points are outside it, and two gray lines mark the compression region if any points are compressed. Values at +/–Inf can be clipped or compressed, but NaN values still do not appear on the plot. Box notches are drawn to scale and may extend beyond the bounds if the median is inside the limit; they are not drawn if the median is outside the limits.

'factordirection'
  • 'data' — Arranges factors with the first value next to the origin. This is the default.

  • 'list' — Arranges factors left-to-right if on the x axis or top-to-bottom if on the y axis.

  • 'auto' — Uses 'data' for numeric grouping variables and 'list' for strings.

'fullfactors'
  • 'off' — One group for each unique row of G. This is the default.

  • 'on' — Create a group for each possible combination of group variable values, including combinations that do not appear in the data.

'factorseparator'

Specifies which factors should have their values separated by a grid line. The value may be 'auto' or a vector of grouping variable numbers. For example, [1 2] adds a separator line when the first or second grouping variable changes value. 'auto' is [] for one grouping variable and [1] for two or more grouping variables. The default is [].

'factorgap'

Specifies an extra gap to leave between boxes when the corresponding grouping factor changes value, expressed as a percentage of the width of the plot. For example, with [3 1], the gap is 3% of the width of the plot between groups with different values of the first grouping variable, and 1% between groups with the same value of the first grouping variable but different values for the second. 'auto' specifies that boxplot should choose a gap automatically. The default is [].

'grouporder'

Order of groups for plotting, specified as a cell array of strings. With multiple grouping variables, separate values within each string with a comma. Using categorical arrays as grouping variables is an easier way to control the order of the boxes. The default is [], which does not reorder the boxes.

'jitter'

Maximum distance d to displace outliers along the factor axis by a uniform random amount, in order to make duplicate points visible. A d of 1 makes the jitter regions just touch between the closest adjacent groups. The default is 0.

'labels'

A character array, cell array of strings, or numeric vector of box labels. There may be one label per group or one label per X value. Multiple label variables may be specified via a numeric matrix or a cell array containing any of these types.

'labelorientation'
  • 'inline' — Rotates the labels to be vertical. This is the default when 'plotstyle' is 'compact'.

  • 'horizontal' — Leaves the labels horizontal. This is the default when 'plotstyle' has the default value of 'traditional'.

When the labels are on the y axis, both settings leave the labels horizontal.

'labelverbosity'
  • 'all' — Displays every label. This is the default.

  • 'minor' — Displays a label for a factor only when that factor has a different value from the previous group.

  • 'majorminor' — Displays a label for a factor when that factor or any factor major to it has a different value from the previous group.

'medianstyle'
  • 'line' — Draws a line for the median. This is the default.

  • 'target' — Draws a black dot inside a white circle for the median.

'notch'
  • 'on' — Draws comparison intervals using notches when 'plotstyle' is 'traditional', or triangular markers when 'plotstyle' is 'compact'.

  • 'marker' — Draws comparison intervals using triangular markers.

  • 'off' — Omits notches. This is the default.

Two medians are significantly different at the 5% significance level if their intervals do not overlap. Interval endpoints are the extremes of the notches or the centers of the triangular markers. When the sample size is small, notches may extend beyond the end of the box.

'orientation'
  • 'vertical' — Plots X on the y axis. This is the default.

  • 'horizontal' — Plots X on the x axis.

'outliersize'

Size of the marker used for outliers, in points. The default is 6 (6/72 inch).

'positions'

Box positions specified as a numeric vector with one entry per group or X value. The default is 1:numGroups, where numGroups is the number of groups.

'symbol'

Symbol and color to use for outliers, using the same values as the LineSpec parameter in plot. The default is 'r+'. If the symbol is omitted then the outliers are invisible; if the color is omitted then the outliers have the same color as their corresponding box.

'whisker'

Maximum whisker length w. The default is a w of 1.5. Points are drawn as outliers if they are larger than q3 + w(q3q1) or smaller than q1w(q3q1), where q1 and q3 are the 25th and 75th percentiles, respectively. The default of 1.5 corresponds to approximately +/–2.7σ and 99.3 coverage if the data are normally distributed. The plotted whisker extends to the adjacent value, which is the most extreme data value that is not an outlier. Set 'whisker' to 0 to give no whiskers and to make every point outside of q1 and q3 an outlier.

'widths'

A scalar or vector of box widths for when 'boxstyle' is 'outline'. The default is half of the minimum separation between boxes, which is 0.5 when the 'positions' argument takes its default value. The list of values is replicated or truncated as necessary.

When the 'plotstyle' parameter takes the value 'compact', the default values for other parameters are the listed in the following table.

ParameterDefault when 'plotstyle' is 'compact'
'boxstyle''filled'
'factorseparator''auto'
'factorgap''auto'
'jitter'0.5
'labelorientation''inline'
'labelverbosity''majorminor'
'medianstyle''target'
'outliersize'4
'symbol''o'

You can see data values and group names using the data cursor in the figure window. The cursor shows the original values of any points affected by the 'datalim' parameter. You can label the group to which an outlier belongs using the gname function.

To modify graphics properties of a box plot component, use findobj with the 'Tag' property to find the component's handle. 'Tag' values for box plot components depend on parameter settings, and are listed in the table below.

Parameter Settings'Tag' Values

All settings

  • 'Box'

  • 'Outliers'

When 'plotstyle' is 'traditional'

  • 'Median'

  • 'Upper Whisker'

  • 'Lower Whisker'

  • 'Upper Adjacent Value'

  • 'Lower Adjacent Value'

When 'plotstyle' is 'compact'

  • 'Whisker'

  • 'MedianOuter'

  • 'MedianInner'

When 'notch' is 'marker'

  • 'NotchLo'

  • 'NotchHi'

Examples

Example 1

Create a box plot of car mileage, grouped by country:

load carsmall
boxplot(MPG,Origin)

Example 2

Create notched box plots for two groups of sample data:

x1 = normrnd(5,1,100,1);
x2 = normrnd(6,1,100,1);
boxplot([x1,x2],'notch','on')

The difference between the medians of the two groups is approximately 1. Since the notches in the box plot do not overlap, you can conclude, with 95% confidence, that the true medians do differ.

The following figure shows the box plot for the same data with the length of the whiskers specified as 1.0 times the interquartile range. Points beyond the whiskers are displayed using +.

boxplot([x1,x2],'notch','on','whisker',1)

Example 3

A 'plotstyle' of 'compact' is useful for large numbers of groups:

X = randn(100,25);

subplot(2,1,1)
boxplot(X)

subplot(2,1,2)
boxplot(X,'plotstyle','compact')

References

[1] McGill, R., J. W. Tukey, and W. A. Larsen. "Variations of Boxplots." The American Statistician. Vol. 32, No. 1, 1978, pp. 12–16.

[2] Velleman, P.F., and D.C. Hoaglin. Applications, Basics, and Computing of Exploratory Data Analysis. Pacific Grove, CA: Duxbury Press, 1981.

[3] Nelson, L. S. "Evaluating Overlapping Confidence Intervals." Journal of Quality Technology. Vol. 21, 1989, pp. 140–141.

See Also

anova1, kruskalwallis, multcompare

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS