This function visualizes raw (grouped) data along with the mean, 95% confidence interval, and 1 SD.
Updated 28 Mar 2017
Whilst box plots have their place, it's sometimes nicer to see all the data, rather than hiding them with summary statistics such as the interquartile range. This function (with a tongue in cheek name) addresses this problem. The use of the mean instead of the median and the SEM and SD instead of quartiles and whiskers are deliberate. Jittered raw data are plotted for each group. Also shown are the mean, and 95% confidence intervals for the mean. This plotting style is designed to be used alongside parametric tests such as ANOVA and the ttest. Comparing the jittered data to the error bars provides a visual indication of whether the normality assumptions of the statistical tests are being violated. Furthermore, it allows one to eyeball the data to look for significant differences between means (nonoverlapping confidence intervals indicate a significant difference at the chosen pvalue, which here is 5%). Also see: http://jcb.rupress.org/cgi/content/abstract/177/1/7 Finally, 1 SD is also shown. Note that if data are not normally distributed then these statistics will be less meaningful.
The function has several examples and there are various visualization possibilities in addition to those shown in the above screenshot. For instance, the coloured areas can be replaced by lines.
Although it's worked well for situations I've needed it, I will be happy to modify the function if users come up against problems.
New features from March 2017:
Accepts LinearModel objects as a y variable; Accepts Table objects as a y variable; Examples are now in standalone files; Legacy call format removed; x variable does not need to be defined as empty when it isn't being used
See GitHub page for more details.
Gyan Raj Koirala (view profile)
Hi Rob,
My data includes five different states each with states with three different conditions.
Say States= Sunday, Monday, Tuesday, Wednesday and Thursday
Conditions= Morning, Afternoon and Night
I gonna use notBoxPlot to plot each condition box with different colors. Is it possible???
I have also problem in labeling those different conditions and states.
Thanks in advance.
Coline Prevost (view profile)
Hi Rob, I have used this function without issues in the past, but I'm now facing the same problem as Isa, even when running "simpleExamples". I'm using matlab R2014a. Any idea? Thanks a lot.
Rob Campbell (view profile)
Isa, I can't tell what is wrong based when you wrote. Get the new version that I sent today and look at the examples. Likely you have fed in the wrong input arguments. Maybe I should be catching that to provide a better error message. If you continue to have problems, file an issue on Github: https://github.com/raacampbell/notBoxPlot/issues
Isa (view profile)
Hi Rob, really looking forard to use the function, however even when I try to run the examples I get the following error: Error using var (line 97)
The length of W must be compatible with X.
Error in std (line 31)
y = sqrt(var(varargin{:}));
Error in notBoxPlot/myPlotter (line 328)
SD=std(Y,'omitnan'); %Requires the stats toolbox
Error in notBoxPlot (line 293)
[hTemp,statsTemp]=myPlotter(x(f),y(:,f));
Error in example (line 20)
notBoxPlot(r,[],'jitter',0.5)
Anybody got any advice?
Rob Campbell (view profile)
Francisco, I'm migrating this to GitHub. Not had a chance to try it on R2015a yet.
Please see here for updates:
https://github.com/raacampbell/notBoxPlot/issues/5
Francisco (view profile)
Hi Rob,
No other errors. Only that. Doesn't happen with the legacy syntax.
Thanks
FdC
Rob Campbell (view profile)
I'm not sure what's going on. I can't reproduce your problem in 2015b. I don't think I have a 2015a install to hand, but I'll look. The error is very strange: not only is it claiming that 'sdline' is invalid but that it's some how too large a variable. Do you see any other errors relating to input arguments?
Francisco (view profile)
Got an error using the new syntax (not legacy).
notBoxPlot(rand(50,1),1,'style','sdline')
Error using notBoxPlot (line 211)
The value of 'style' is invalid. Maximum variable size allowed by the function is exceeded.
I'm using R2015a. Any solutions?
Thanks
FdC
Rob Campbell (view profile)
In your scenario I would just use the regular boxplot function in MATLAB and overlay the raw data. There's no point using notBoxPlot if the data are substantially skewed.
Esther (view profile)
Hi,
Great way to visualize normally distributed data.
Is it possible to easily adjust the function in order to use it with skewed data and visualize the median instead of mean etc. (i.e. use it for regular box plots but with overlay of the data)?
Thanks
Shital shirsat rohekar (view profile)
Hi rob,
Thanks for the great function...
But is there a way where I could colorcode the circles within each boxwhisker plot..
for example, the circles in my case are the different stations with NO2 data for 12 months..
Please help.. Many thanks
Shital
gooey (view profile)
AnneLaure GUINET (view profile)
Hello,
The option style doesn't work.
notBoxPlot(essaivariablesparcote2(:,1:2), style, 'sdline')
Undefined function or variable 'style'.
essaivariablesparcote2 is a matrix.
Thanks
Rob Campbell (view profile)
@Remi Chaussenot
Your question is beyond the scope of this comment thread. Can you email me via my profile page or start an issue on GitHub (see Issues link on the top right of this page)
Rob Campbell (view profile)
@Manuel, why would you want to plot one point only over the box? I don't understand.
Manuel (view profile)
Hello Rob, Im looking to plot only one point over the box, is this possible with you function?
Many thanks
Manuel
David L (view profile)
Andreas Trier Poulsen (view profile)
Remi Chaussenot (view profile)
Hello Rob,
I love your functions, but have a couple of questions.
I'm in laboratory Neurosciences, working on mice (like wildtype VS knockout), so usually, my dataset looks like :
'WT' [ 453] [ 5] [ 70] [ 45] [ 20] [ 20] [ 70] [ 65]
'WT' [ 468] [ 0] [ 70] [ 35] [ 10] [ 20] [ 50] [ 65]
'WT' [ 466] [ 5] [ 50] [ 35] [ 15] [ 20] [ 40] [ 60]
'WT' [ 452] [ 5] [ 65] [ 40] [ 25] [ 35] [ 75] [ 70]
'WT' [ 470] [ 0] [ 60] [ 25] [ 10] [ 20] [ 35] [ 55]
'WT' [ 467] [ 0] [ 55] [ 40] [ 10] [ 15] [ 35] [ 60]
'WT' [ 456] [ 0] [ 65] [ 40] [ 10] [ 25] [ 70] [ 60]
'MDX' [ 455] [ 0] [ 40] [ 30] [ 0] [ 5] [ 70] [ 55]
'MDX' [ 473] [ 0] [ 50] [ 35] [ 5] [ 20] [ 45] [ 55]
'MDX' [ 472] [ 0] [ 65] [ 35] [ 5] [ 25] [ 50] [ 60]
'MDX' [ 465] [ 0] [ 50] [ 35] [ 10] [ 30] [ 70] [ 65]
'MDX' [ 469] [ 0] [ 65] [ 55] [ 15] [ 20] [ 45] [ 65]
'MDX' [ 471] [ 30] [ 75] [ 50] [ 50] [ 45] [ 80] [ 80]
'MDX' [ 464] [ 0] [ 50] [ 30] [ 10] [ 10] [ 45] [ 60]
In a perfect world, i would enjoy to plot on the xaxis all measures (first row) and have two separate line of dots for each genotype for each measure. I think it is impossible, so i plot first a plot of WT and after the plot to of MDX.
Then, i try to add my labels with :
notBoxPlot(ndata_wt);
% Adding xaxis
entete = alldata(1:1,3:end)
ax = gca;
ax.XTickLabel = entete;
ax.XTickLabelRotation = 45;
But it is not working, any idea ?
Thanks !
alldata :
'Genotype' 'Number' 'Clic' '2kHz' '4kHz' '8kHz' '16kHz' '24kHz' '32kHz'
'WT' [ 453] [ 5] [ 70] [ 45] [ 20] [ 20] [ 70] [ 65]
'WT' [ 468] [ 0] [ 70] [ 35] [ 10] [ 20] [ 50] [ 65]
'WT' [ 466] [ 5] [ 50] [ 35] [ 15] [ 20] [ 40] [ 60]
'WT' [ 452] [ 5] [ 65] [ 40] [ 25] [ 35] [ 75] [ 70]
'WT' [ 470] [ 0] [ 60] [ 25] [ 10] [ 20] [ 35] [ 55]
'WT' [ 467] [ 0] [ 55] [ 40] [ 10] [ 15] [ 35] [ 60]
'WT' [ 456] [ 0] [ 65] [ 40] [ 10] [ 25] [ 70] [ 60]
'MDX' [ 455] [ 0] [ 40] [ 30] [ 0] [ 5] [ 70] [ 55]
'MDX' [ 473] [ 0] [ 50] [ 35] [ 5] [ 20] [ 45] [ 55]
'MDX' [ 472] [ 0] [ 65] [ 35] [ 5] [ 25] [ 50] [ 60]
'MDX' [ 465] [ 0] [ 50] [ 35] [ 10] [ 30] [ 70] [ 65]
'MDX' [ 469] [ 0] [ 65] [ 55] [ 15] [ 20] [ 45] [ 65]
'MDX' [ 471] [ 30] [ 75] [ 50] [ 50] [ 45] [ 80] [ 80]
'MDX' [ 464] [ 0] [ 50] [ 30] [ 10] [ 10] [ 45] [ 60]
ndata_wt :
5 70 45 20 20 70 65
0 70 35 10 20 50 65
5 50 35 15 20 40 60
5 65 40 25 35 75 70
0 60 25 10 20 35 55
0 55 40 10 15 35 60
0 65 40 10 25 70 60
Rob Campbell (view profile)
You could try the rotate tick label function here on the FEX. It's ID #8722
Roy Granit (view profile)
Hi Rob,
Great function!
I just have a problem with the XAxis labels, I cannot get them to be vertical  any suggestions?
Thanks,
Roy
Rob Campbell (view profile)
F4b,
Yes this is possible, but there are two caveats:
1. You first need a criterion of your own for determining what defines an outlier.
2. You have to do a *little* extra coding to highlight the points. Here is an example where we highlight the most positive point only. This is the case because I deliberately have avoided adding too many extra features such as this into the function. The idea is to make it easy for others to modify the plots as needed.
Here is a toy example:
clf, H=notBoxPlot(randn(40,5));
hold on
for ii=1:5
y=get(H(ii).data,'YData');
x=get(H(ii).data,'XData');
f=find(y==max(y));
plot(x(f),y(f),'or','markersize',10)
end
hold off
F4b (view profile)
F4b (view profile)
Hello,
very nice function!
I was wondering if it would be possible to show a label for each point in order to identify which points are outliers.
Thank you very much,
F4b
Rob Campbell (view profile)
Mathew, what you're asking for is already possible with just one more line of code:
clf,H=notBoxPlot(rand([30,3])); set(H(2).data,'markerfacecolor','g')
set(H(3).data,'markerfacecolor','c')
legend([H.data],'A','B','C') %add legend
set(gca,'XTickLabel',{'A','B','C'})
Matthew (view profile)
This is a great function, but would be even more great if it displayed the legend so that if you plot two groups, the legend corresponds automatically to the markerfacecolor set in the function.
Rob Campbell (view profile)
Example 3.
Chris (view profile)
Hi, and I like this, we have data with an N of 300, and would like to reduce the size of the dots, as they obscure the mean/SE/SD 'patch'
thanks
chris
Erik (view profile)
J.R.! Menzinger (view profile)
Tom (view profile)
Rob Campbell (view profile)
Yes, a boxplot shows the median and quartiles, etc, so can be asymmetric and what not. If that's what you want, then use the MATLAB boxplot function. This version is, as the name says, /not/, a box plot. It uses the mean and statistics relating to the mean. These produce symmetrical error bars. There is rationale to this and, TBH, this function is aimed more at replacing bar charts than at replacing box plots.
The rationale is that ttests and ANOVA are often performed on data which are typically plotted as bar charts and sometimes boxplots. However, tests are based upon the mean, yet boxplots show the median. Bar charts are often found supplemented with errors bars displaying 1 standard error of the mean (1 SEM), which does not reflect the p=0.05 significance criterion often used in biology and the social sciences. The 95% confidence interval used here provides a visual indicator of significance. In most bar charts the raw data are not overlaid, which greatly reduces the utility of the plot as it hides the underlying data. Yet with carefully chosen plot options, which is facilitated by this function, it's often possible to plot all the raw data even for large numbers of groups. I believe that overlaid raw data are usually more informative than quartiles and whiskers of a box plot. Of course that's a personal preference.
arnold (view profile)
I might be mistaken, but isn't the line in a boxplot supposed to be the median?
I tried this with my data and the box is always symmetrical, whereas using the matlab boxplot function one can see how unevently distributed the data is (well, one can see that from the single datapoints plotted by 'notboxplot' too).
is there a way to make the box behave "normally"?
Surojit Biswas (view profile)
I like it
Rob Campbell (view profile)
I see: instead of x being a vector (of numbers). That would seem like an inuitive extension. I'll do it when I get a moment.
Adam (view profile)
Very nice. Thank you! I'd like to also request that the second argument could be a list of strings the define the groups.
e.g.
notBoxPlot(data, grouplabels)
Rob Campbell (view profile)
Gavin,
Thanks for your bug report. I have submitted a fix that correctly parses:
notBoxPlot(randn(1,100),repmat(1:10,1,10),0.1,'line')
Gavin (view profile)
If one uses vectors for the y and x inputs, the jitter and style options don't work. This is because the recursive call to notBoxPlot on line 115 doesn't pass those options through.
Jessica (view profile)
I am very excited to use this plotting tool but I'm having an issue. When I try to run the notboxplot code I get the following error.
"??? Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit. Be aware that exceeding your available stack space can crash MATLAB and/or your computer.
Error in ==> findobjhelper"
What should I set my recursion limit to so that the code works but I do not crash my computer? Or is there something else wrong?
Thanks,
Jessica
Rob Campbell (view profile)
Julia,
The notBoxPlot function returns the handles of the plotted data. It's probably best to use these to do what you want. e.g:
H=notBoxPlot(randn(20,2));
x1=get(H(1).data,'XData');
x1 are the x values of the points in the first box. You can use this approach to get all the x and y data and then plot the lines. You can alter the order of the plot elements on the screen like this: http://www.matlabcookbook.com/recipes/0050_Plotting/0010_Plot_Manipulation/changingPlotOrder.html
My only note of caution is that the plot may look messy because of the jitter along the x axis. You can modify the jitter with the 3rd input argument. If you have many data points then what you're doing may work better as a scatter plot. Perhaps my rug plot command would be of interest? http://www.mathworks.com/matlabcentral/fileexchange/27582rugplots
Julia Sandell (view profile)
Great function, was looking for a way to plot my data points on my boxandwhisker plot and this seems to do the trick.
Was wondering if there were any suggestions on drawing correlating lines between data points and data sets. For example, I have a bunch of data points BEFORE an event for a collection of subjects and then bunch of data points taken AFTER an event for the same subjects. I would like to plot the two sets next to each other using this function and then have lines going from subject 1, before to subject 1, after and subject 2, before to subject, after, etc.
Any suggestions?
Rob Campbell (view profile)
JG:
Q1. The function will return the coordinates of the means so you can use these with polyval. e.g.
H=notBoxPlot(randn(10),[],[],'line');
x=get([H.mu],'XData'), y=get([H.mu],'YData');
Without "line" the above will return two data points for each mean (since the means are lines), but it's easy enough to work with that too. Does that work for you?
Q2. You can do this as follows:
notBoxPlot(randn(10,5),[1,2,5,9,10])
J G (view profile)
How can I use this function with continuous spacing on xaxis?
For example,
p = [0.1 0.25 0.5 0.75 0.9];
boxplot(A,'position',p)
will place the boxplots unevenly spaced along xaxis. Is there a way to do this with this function?
J G (view profile)
Great function! Is there a way to have a trend line through the means, i.e. using polyval/polyfit? Thanks!
Rob Campbell (view profile)
Ok... For some reason adding a patch object causes gname to fail. If you run notBoxPlot using the "line" plotting style then gname works.
Rob Campbell (view profile)
Hmmm... Don't know. I will look into it.
Ian Shapiro (view profile)
Great tool. It's an excellent way to visualize the distribution in a set of data. However, I've found that it does not appear work with 'gname' for labeling individual data points, whereas boxplot is able to do this. Any idea why that's the case?
Rob Campbell (view profile)
I will soon be modifying this function to require no additional toolboxes. Otherwise, which function is best probably depends on the size of the data set. For large sample sizes the violin plots work best. For small sample sizes I prefer the plot style on this page, since it doesn't bin the data.
Alexander (view profile)
In opinion, a better replacement for the builtin boxplot is "Violin Plots for plotting multiple distributions (distributionPlot.m)" which does no require any additional toolboxes. Check:
http://www.mathworks.com/matlabcentral/fileexchange/23661violinplotsforplottingmultipledistributionsdistributionplotm
ted p teng (view profile)
At the moment, I am admiring what I just made with your function. Love it, thank you.
You guys may also want to use this function in conjunction with XTICKLABEL_ROTATE.
Rob Campbell (view profile)
Almost the same way: just don't code your groups as a cell array of strings. To modify your example:
group = [repmat(1, 5, 1); repmat(2, 10, 1); repmat(3, 15, 1)];
notBoxPlot([x;y;z], group)
You can then change the XTickLabels to strings if needed. I've not found I do this often enough to add cell arrays as an input possibility. Perhaps I should, though (when time allows!).
Kelvin (view profile)
I’m wondering how noBoxPlot can plot vectors of different lengths.
i.e.
x = rand(5,1);
y = rand(10,1);
z = rand(15,1);
group = [repmat({'First'}, 5, 1); repmat({'Second'}, 10, 1); repmat({'Third'}, 15, 1)];
boxplot([x;y;z], group)
Thanks in advance!
Harry MacDowel (view profile)
Thanks Rob. Love it.
Andrea (view profile)
Rob Campbell (view profile)
Normally I'd say you should modify the plotted objects with the handles returned by the function. However, it would be awkward to do what you requested in this way. Consequently I've just submitted an update which should do what you want. The 4th argument can how have the values "sdline." If you want to alter the line properties, I recommend doing so by modifying the object properties via the handle returned by the function.
J G (view profile)
This is very useful thanks! Is it possible to plot the SD as error bars instead of the box?
Dylan (view profile)
Very useful, code is very well written.
Rob Campbell (view profile)
Mahmoud,
You can achieve these things in exactly the same way as you would for most other plotting commands. I try to avoid having functions behave too idiosyncratically. So, to answer your question:
clf
h=notBoxPlot(randn(10,2));
set(gca,'XTickLabel',{'GrpA','GrpB'})
ylim([5,5])
The last two lines are obviously standard ways of setting labels and changing the axis limits. These work with any plot. Note that the notBoxPlot function returns the handles to the plot objects so that you can change their properties or even delete them. For example, you could remove all the data points by doing: delete([h.data])
Mahmoud (view profile)
Very Useful!
Two questions,
1) How do you add labels to the xaxis like you would with the 'label' option in the boxplot function?
2) How can you specify what range should be plotted on the yaxis of notboxPlot?
Rossella Blatt (view profile)
Very nice and useful. Thanks!
Rob Campbell (view profile)
Really? I thought I zipped it in there. Thanks for letting me know. I shall reupload.
Michael Ashby (view profile)
I like the idea but it seems to be missing the required "SEM_calc" function.