Code covered by the BSD License  

Highlights from
notBoxPlot - alternative to box plots.

4.71429

4.7 | 16 ratings Rate this file 144 Downloads (last 30 days) File Size: 5.21 KB File ID: #26508
image thumbnail

notBoxPlot - alternative to box plots.

by

 

28 Jan 2010 (Updated )

This function visualizes raw (grouped) data along with the mean, 95% confidence interval, and 1 SD.

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

| Watch this File

File Information
Description

Whilst box plots have their place, it's sometimes nicer to see all the data, rather than hiding them with summary statistics such as the inter-quartile range. This function (with a tongue in cheek name) addresses this problem. The use of the mean instead of the median and the SEM and SD instead of quartiles and whiskers are deliberate.

Jittered raw data are plotted for each group. Also shown are the mean, and 95% confidence intervals for the mean. This plotting style is designed to be used alongside parametric tests such as ANOVA and the t-test. Comparing the jittered data to the error bars provides a visual indication of whether the normality assumptions of the statistical tests are being violated. Furthermore, it allows one to eyeball the data to look for significant differences between means (non-overlapping confidence intervals indicate a significant difference at the chosen p-value, which here is 5%). Also see: http://jcb.rupress.org/cgi/content/abstract/177/1/7 Finally, 1 SD is also shown. Note that if data are not normally distributed then these statistics will be less meaningful.

The function has several examples and there are various visualization possibilities in addition to those shown in the above screenshot. For instance, the coloured areas can be replaced by lines.

Although it's worked well for situations I've needed it, I will be happy to modify the function if users come up against problems.

%%%%%%
Included functions
notBoxPlot.m - generates plots as shown in screenshot
SEM_calc.m - calculate standard error of the mean. Provided as a separate function file so that it can be used for other purposes.
tInterval_Calc.m - calculate a t-interval. Right now notBoxPlot doesn't make use of this (unless the user edits the code, of course), but it still might be useful. For small sample sizes, the t-interval is larger than the SEM.

* NOTE *
The statistics toolbox is not required if you install the nantoolbox from here: http://pub.ist.ac.at/~schloegl/matlab/NaN/ Otherwise you will need the statistics toolbox for nan-handling.

MATLAB release MATLAB 8.1 (R2013a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (41)
11 Jul 2014 Rob Campbell

F4b,

Yes this is possible, but there are two caveats:

1. You first need a criterion of your own for determining what defines an outlier.

2. You have to do a *little* extra coding to highlight the points. Here is an example where we highlight the most positive point only. This is the case because I deliberately have avoided adding too many extra features such as this into the function. The idea is to make it easy for others to modify the plots as needed.

Here is a toy example:

clf, H=notBoxPlot(randn(40,5));
hold on

for ii=1:5
y=get(H(ii).data,'YData');
x=get(H(ii).data,'XData');
f=find(y==max(y));
plot(x(f),y(f),'or','markersize',10)
end

hold off

11 Jul 2014 F4b  
11 Jul 2014 F4b

Hello,
very nice function!

I was wondering if it would be possible to show a label for each point in order to identify which points are outliers.

Thank you very much,
F4b

07 Jul 2014 Rob Campbell

Mathew, what you're asking for is already possible with just one more line of code:

clf,H=notBoxPlot(rand([30,3])); set(H(2).data,'markerfacecolor','g')
set(H(3).data,'markerfacecolor','c')
legend([H.data],'A','B','C') %add legend
set(gca,'XTickLabel',{'A','B','C'})

06 Jul 2014 Matthew

This is a great function, but would be even more great if it displayed the legend so that if you plot two groups, the legend corresponds automatically to the markerfacecolor set in the function.

21 May 2014 Rob Campbell

Example 3.

21 May 2014 Chris

Hi, and I like this, we have data with an N of 300, and would like to reduce the size of the dots, as they obscure the mean/SE/SD 'patch'

thanks
chris

15 Mar 2014 Erik  
29 Oct 2013 J.R.! Menzinger  
19 Oct 2013 Tom  
16 Sep 2013 Rob Campbell

Yes, a box-plot shows the median and quartiles, etc, so can be asymmetric and what not. If that's what you want, then use the MATLAB boxplot function. This version is, as the name says, /not/, a box plot. It uses the mean and statistics relating to the mean. These produce symmetrical error bars. There is rationale to this and, TBH, this function is aimed more at replacing bar charts than at replacing box plots.

The rationale is that t-tests and ANOVA are often performed on data which are typically plotted as bar charts and sometimes box-plots. However, tests are based upon the mean, yet box-plots show the median. Bar charts are often found supplemented with errors bars displaying 1 standard error of the mean (1 SEM), which does not reflect the p=0.05 significance criterion often used in biology and the social sciences. The 95% confidence interval used here provides a visual indicator of significance. In most bar charts the raw data are not overlaid, which greatly reduces the utility of the plot as it hides the underlying data. Yet with carefully chosen plot options, which is facilitated by this function, it's often possible to plot all the raw data even for large numbers of groups. I believe that overlaid raw data are usually more informative than quartiles and whiskers of a box plot. Of course that's a personal preference.

16 Sep 2013 arnold

I might be mistaken, but isn't the line in a boxplot supposed to be the median?

I tried this with my data and the box is always symmetrical, whereas using the matlab boxplot function one can see how unevently distributed the data is (well, one can see that from the single datapoints plotted by 'notboxplot' too).

is there a way to make the box behave "normally"?

25 Jul 2013 Surojit Biswas

I like it

12 Jul 2013 Rob Campbell

I see: instead of x being a vector (of numbers). That would seem like an inuitive extension. I'll do it when I get a moment.

12 Jul 2013 Adam

Very nice. Thank you! I'd like to also request that the second argument could be a list of strings the define the groups.

e.g.

notBoxPlot(data, grouplabels)

03 Jul 2013 Rob Campbell

Gavin,
Thanks for your bug report. I have submitted a fix that correctly parses:
notBoxPlot(randn(1,100),repmat(1:10,1,10),0.1,'line')

03 Jul 2013 Gavin

If one uses vectors for the y and x inputs, the jitter and style options don't work. This is because the recursive call to notBoxPlot on line 115 doesn't pass those options through.

13 Apr 2013 Jessica

I am very excited to use this plotting tool but I'm having an issue. When I try to run the notboxplot code I get the following error.
"??? Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit. Be aware that exceeding your available stack space can crash MATLAB and/or your computer.

Error in ==> findobjhelper"

What should I set my recursion limit to so that the code works but I do not crash my computer? Or is there something else wrong?

Thanks,
Jessica

15 Mar 2013 Rob Campbell

Julia,

The notBoxPlot function returns the handles of the plotted data. It's probably best to use these to do what you want. e.g:

H=notBoxPlot(randn(20,2));
x1=get(H(1).data,'XData');

x1 are the x values of the points in the first box. You can use this approach to get all the x and y data and then plot the lines. You can alter the order of the plot elements on the screen like this: http://www.matlab-cookbook.com/recipes/0050_Plotting/0010_Plot_Manipulation/changingPlotOrder.html

My only note of caution is that the plot may look messy because of the jitter along the x axis. You can modify the jitter with the 3rd input argument. If you have many data points then what you're doing may work better as a scatter plot. Perhaps my rug plot command would be of interest? http://www.mathworks.com/matlabcentral/fileexchange/27582-rug-plots

07 Mar 2013 Julia Sandell

Great function, was looking for a way to plot my data points on my box-and-whisker plot and this seems to do the trick.

Was wondering if there were any suggestions on drawing correlating lines between data points and data sets. For example, I have a bunch of data points BEFORE an event for a collection of subjects and then bunch of data points taken AFTER an event for the same subjects. I would like to plot the two sets next to each other using this function and then have lines going from subject 1, before to subject 1, after and subject 2, before to subject, after, etc.

Any suggestions?

29 May 2012 Rob Campbell

JG:
Q1. The function will return the coordinates of the means so you can use these with polyval. e.g.
H=notBoxPlot(randn(10),[],[],'line');
x=get([H.mu],'XData'), y=get([H.mu],'YData');
Without "line" the above will return two data points for each mean (since the means are lines), but it's easy enough to work with that too. Does that work for you?

Q2. You can do this as follows:
notBoxPlot(randn(10,5),[1,2,5,9,10])

29 May 2012 J G

How can I use this function with continuous spacing on x-axis?
For example,
p = [0.1 0.25 0.5 0.75 0.9];
boxplot(A,'position',p)
will place the boxplots unevenly spaced along x-axis. Is there a way to do this with this function?

28 May 2012 J G

Great function! Is there a way to have a trend line through the means, i.e. using polyval/polyfit? Thanks!

20 Apr 2012 Rob Campbell

Ok... For some reason adding a patch object causes gname to fail. If you run notBoxPlot using the "line" plotting style then gname works.

29 Mar 2012 Rob Campbell

Hmmm... Don't know. I will look into it.

29 Mar 2012 Ian Shapiro

Great tool. It's an excellent way to visualize the distribution in a set of data. However, I've found that it does not appear work with 'gname' for labeling individual data points, whereas boxplot is able to do this. Any idea why that's the case?

19 Mar 2012 Rob Campbell

I will soon be modifying this function to require no additional toolboxes. Otherwise, which function is best probably depends on the size of the data set. For large sample sizes the violin plots work best. For small sample sizes I prefer the plot style on this page, since it doesn't bin the data.

19 Mar 2012 Alexander

In opinion, a better replacement for the builtin boxplot is "Violin Plots for plotting multiple distributions (distributionPlot.m)" which does no require any additional toolboxes. Check:
http://www.mathworks.com/matlabcentral/fileexchange/23661-violin-plots-for-plotting-multiple-distributions-distributionplot-m

06 Mar 2012 ted p teng

At the moment, I am admiring what I just made with your function. Love it, thank you.
You guys may also want to use this function in conjunction with XTICKLABEL_ROTATE.

16 Feb 2012 Rob Campbell

Almost the same way: just don't code your groups as a cell array of strings. To modify your example:

group = [repmat(1, 5, 1); repmat(2, 10, 1); repmat(3, 15, 1)];
notBoxPlot([x;y;z], group)

You can then change the XTickLabels to strings if needed. I've not found I do this often enough to add cell arrays as an input possibility. Perhaps I should, though (when time allows!).

16 Feb 2012 Kelvin

I’m wondering how noBoxPlot can plot vectors of different lengths.

i.e.
x = rand(5,1);
y = rand(10,1);
z = rand(15,1);
group = [repmat({'First'}, 5, 1); repmat({'Second'}, 10, 1); repmat({'Third'}, 15, 1)];
boxplot([x;y;z], group)

Thanks in advance!

28 Oct 2011 Harry MacDowel

Thanks Rob. Love it.

17 Oct 2011 Andrea  
20 Sep 2011 Rob Campbell

Normally I'd say you should modify the plotted objects with the handles returned by the function. However, it would be awkward to do what you requested in this way. Consequently I've just submitted an update which should do what you want. The 4th argument can how have the values "sdline." If you want to alter the line properties, I recommend doing so by modifying the object properties via the handle returned by the function.

08 Sep 2011 J G

This is very useful thanks! Is it possible to plot the SD as error bars instead of the box?

28 Jul 2011 Dylan

Very useful, code is very well written.

29 Mar 2011 Rob Campbell

Mahmoud,
You can achieve these things in exactly the same way as you would for most other plotting commands. I try to avoid having functions behave too idiosyncratically. So, to answer your question:
clf
h=notBoxPlot(randn(10,2));
set(gca,'XTickLabel',{'GrpA','GrpB'})
ylim([-5,5])

The last two lines are obviously standard ways of setting labels and changing the axis limits. These work with any plot. Note that the notBoxPlot function returns the handles to the plot objects so that you can change their properties or even delete them. For example, you could remove all the data points by doing: delete([h.data])

29 Mar 2011 Mahmoud

Very Useful!
Two questions,
1) How do you add labels to the x-axis like you would with the 'label' option in the boxplot function?
2) How can you specify what range should be plotted on the y-axis of notboxPlot?

30 Nov 2010 Rossella Blatt

Very nice and useful. Thanks!

29 Jan 2010 Rob Campbell

Really? I thought I zipped it in there. Thanks for letting me know. I shall re-upload.

29 Jan 2010 Michael Ashby

I like the idea but it seems to be missing the required "SEM_calc" function.

Updates
29 Jan 2010

re-upload because support file (SEM_calc.m) seemed to be missing

29 Jan 2010

Clarify a point in the description.

30 Jan 2010

Add tInterval_Calc and update the comments in SEM_calc

12 Feb 2010

Add link to JCB article on error bars.

24 Feb 2010

Handle to mean line when in patch mode (the default mode) is now returned.

20 Sep 2011

The 4th argument can now also have the value "sdline". This creates plots where the SD is a line instead of a patch.

21 Sep 2011

If "y" is a vector then the function ensures it is a column vector in order to yield one box-plot.

12 Oct 2011

Both x and y can now be vectors, in which case the function behaves like Mathworks' boxplot. An example of this behaviour is provided.

An example of the "sdline" plot style is now provided.

14 Nov 2011

Fix bug that was causing handles not return for one of the plot formats.

10 Jan 2012

Better handles x-ticks and x axis limits. Add missing semicolon.

19 Mar 2012

Update summary to explain that the function works without the stats toolbox if the nan-toolbox is installed.

08 Jul 2013

Fixed bug that didn't pass input arguments correctly when two vectors are supplied.

17 Sep 2013

Update summary.

Contact us