File Exchange

image thumbnail

notBoxPlot

version 1.30 (167 KB) by

This function visualizes raw (grouped) data along with the mean, 95% confidence interval, and 1 SD.

4.82143
31 Ratings

128 Downloads

Updated

Whilst box plots have their place, it's sometimes nicer to see all the data, rather than hiding them with summary statistics such as the inter-quartile range. This function (with a tongue in cheek name) addresses this problem. The use of the mean instead of the median and the SEM and SD instead of quartiles and whiskers are deliberate. Jittered raw data are plotted for each group. Also shown are the mean, and 95% confidence intervals for the mean. This plotting style is designed to be used alongside parametric tests such as ANOVA and the t-test. Comparing the jittered data to the error bars provides a visual indication of whether the normality assumptions of the statistical tests are being violated. Furthermore, it allows one to eyeball the data to look for significant differences between means (non-overlapping confidence intervals indicate a significant difference at the chosen p-value, which here is 5%). Also see: http://jcb.rupress.org/cgi/content/abstract/177/1/7 Finally, 1 SD is also shown. Note that if data are not normally distributed then these statistics will be less meaningful.
The function has several examples and there are various visualization possibilities in addition to those shown in the above screenshot. For instance, the coloured areas can be replaced by lines.

Although it's worked well for situations I've needed it, I will be happy to modify the function if users come up against problems.
New features from March 2017:
Accepts LinearModel objects as a y variable; Accepts Table objects as a y variable; Examples are now in standalone files; Legacy call format removed; x variable does not need to be defined as empty when it isn't being used

See GitHub page for more details.

Comments and Ratings (78)

Omri Raccah

Thank you Rob!

Rob Campbell

Rob Campbell (view profile)

There is no direct way of doing that. If you have the indexes then you should be able to directly index the XData and YData properties of the plot object containing the points.

Hi Rob,

Is there a way to plot the data in such a fashion that I can have two separate markers for two sets of points in the plot? To clarify, I need two separate markers in a single column. i have the original indexes but it is pretty hard to map those points back to the points returned by the function.

Rob Campbell

Rob Campbell (view profile)

Omri: no. You'll need to get the coordinates of the point via the returned plot object and manually replace them.

Omri Raccah

Hi Rob - great function. Is there a way to change the colors of separate point on a signal notbar? Thanks!

SangPil Yoon

I figured it out Thanks...

SangPil Yoon

Hi Rob,
This is really nice function. Thank you so much. One question. How do I change the color of SD, 95% CI, and mean line?

Rob Campbell

Rob Campbell (view profile)

@EM - add the "code" directory to your MATLAB path and it should work. In fact, I don't understand how you got that errror unless you copied "notBoxPlot.m" to a directory in your path and did not move the "+NBP" directory.

ShaneS

ShaneS (view profile)

Hello, thanks for the uploading. Silly question here, how do I install the script?
When I ran it, I got this error:

>> notBoxPlot([7,8,6,1,5,7,2,1,3,4,5,2,4])
Undefined function or variable 'NBP.SEM_calc'.

Error in notBoxPlot/myPlotter (line 346)
SEM=intervalFun(Y); %A function handle to a supplied external function

Error in notBoxPlot (line 312)
[hTemp,statsTemp]=myPlotter(x(f),y(:,f));

KQ Z

KQ Z (view profile)

Ander Biguri

Ander Biguri (view profile)

Rob Campbell

Rob Campbell (view profile)

zeinab esmaeilpour - look at the examples in the help

It is awsome, but I don't know how to change color of boxes?

lobster soup

Hi Rob,
This is fantastic! Super well documented, super clean to use + love that you have table object support!
This should be inbuilt!

Hi Rob,
My data includes five different states each with states with three different conditions.
Say States= Sunday, Monday, Tuesday, Wednesday and Thursday
Conditions= Morning, Afternoon and Night
I gonna use notBoxPlot to plot each condition box with different colors. Is it possible???
I have also problem in labeling those different conditions and states.
Thanks in advance.

Hi Rob, I have used this function without issues in the past, but I'm now facing the same problem as Isa, even when running "simpleExamples". I'm using matlab R2014a. Any idea? Thanks a lot.

Rob Campbell

Rob Campbell (view profile)

Isa, I can't tell what is wrong based when you wrote. Get the new version that I sent today and look at the examples. Likely you have fed in the wrong input arguments. Maybe I should be catching that to provide a better error message. If you continue to have problems, file an issue on Github: https://github.com/raacampbell/notBoxPlot/issues

Isa

Isa (view profile)

Hi Rob, really looking forard to use the function, however even when I try to run the examples I get the following error: Error using var (line 97)
The length of W must be compatible with X.

Error in std (line 31)
y = sqrt(var(varargin{:}));

Error in notBoxPlot/myPlotter (line 328)
SD=std(Y,'omitnan'); %Requires the stats toolbox

Error in notBoxPlot (line 293)
[hTemp,statsTemp]=myPlotter(x(f),y(:,f));

Error in example (line 20)
notBoxPlot(r,[],'jitter',0.5)

Anybody got any advice?

Rob Campbell

Rob Campbell (view profile)

Francisco, I'm migrating this to GitHub. Not had a chance to try it on R2015a yet.
Please see here for updates:
https://github.com/raacampbell/notBoxPlot/issues/5

Francisco

Hi Rob,
No other errors. Only that. Doesn't happen with the legacy syntax.
Thanks
FdC

Rob Campbell

Rob Campbell (view profile)

I'm not sure what's going on. I can't reproduce your problem in 2015b. I don't think I have a 2015a install to hand, but I'll look. The error is very strange: not only is it claiming that 'sdline' is invalid but that it's some how too large a variable. Do you see any other errors relating to input arguments?

Francisco

Got an error using the new syntax (not legacy).

notBoxPlot(rand(50,1),1,'style','sdline')

Error using notBoxPlot (line 211)
The value of 'style' is invalid. Maximum variable size allowed by the function is exceeded.

I'm using R2015a. Any solutions?
Thanks
FdC

Rob Campbell

Rob Campbell (view profile)

In your scenario I would just use the regular boxplot function in MATLAB and overlay the raw data. There's no point using notBoxPlot if the data are substantially skewed.

Esther

Esther (view profile)

Hi,

Great way to visualize normally distributed data.

Is it possible to easily adjust the function in order to use it with skewed data and visualize the median instead of mean etc. (i.e. use it for regular box plots but with overlay of the data)?

Thanks

Hi rob,
Thanks for the great function...
But is there a way where I could color-code the circles within each box-whisker plot..
for example, the circles in my case are the different stations with NO2 data for 12 months..
Please help.. Many thanks
Shital

Bob Spunt

Hello,
The option style doesn't work.

notBoxPlot(essaivariablesparcote2(:,1:2), style, 'sdline')
Undefined function or variable 'style'.

essaivariablesparcote2 is a matrix.

Thanks

Rob Campbell

Rob Campbell (view profile)

@Remi Chaussenot
Your question is beyond the scope of this comment thread. Can you e-mail me via my profile page or start an issue on GitHub (see Issues link on the top right of this page)

Rob Campbell

Rob Campbell (view profile)

@Manuel, why would you want to plot one point only over the box? I don't understand.

Manuel

Manuel (view profile)

Hello Rob, Im looking to plot only one point over the box, is this possible with you function?

Many thanks

Manuel

David L

Hello Rob,

I love your functions, but have a couple of questions.

I'm in laboratory Neurosciences, working on mice (like wild-type VS knock-out), so usually, my dataset looks like :
'WT' [ 453] [ 5] [ 70] [ 45] [ 20] [ 20] [ 70] [ 65]
'WT' [ 468] [ 0] [ 70] [ 35] [ 10] [ 20] [ 50] [ 65]
'WT' [ 466] [ 5] [ 50] [ 35] [ 15] [ 20] [ 40] [ 60]
'WT' [ 452] [ 5] [ 65] [ 40] [ 25] [ 35] [ 75] [ 70]
'WT' [ 470] [ 0] [ 60] [ 25] [ 10] [ 20] [ 35] [ 55]
'WT' [ 467] [ 0] [ 55] [ 40] [ 10] [ 15] [ 35] [ 60]
'WT' [ 456] [ 0] [ 65] [ 40] [ 10] [ 25] [ 70] [ 60]
'MDX' [ 455] [ 0] [ 40] [ 30] [ 0] [ 5] [ 70] [ 55]
'MDX' [ 473] [ 0] [ 50] [ 35] [ 5] [ 20] [ 45] [ 55]
'MDX' [ 472] [ 0] [ 65] [ 35] [ 5] [ 25] [ 50] [ 60]
'MDX' [ 465] [ 0] [ 50] [ 35] [ 10] [ 30] [ 70] [ 65]
'MDX' [ 469] [ 0] [ 65] [ 55] [ 15] [ 20] [ 45] [ 65]
'MDX' [ 471] [ 30] [ 75] [ 50] [ 50] [ 45] [ 80] [ 80]
'MDX' [ 464] [ 0] [ 50] [ 30] [ 10] [ 10] [ 45] [ 60]

In a perfect world, i would enjoy to plot on the x-axis all measures (first row) and have two separate line of dots for each genotype for each measure. I think it is impossible, so i plot first a plot of WT and after the plot to of MDX.

Then, i try to add my labels with :
notBoxPlot(ndata_wt);
% Adding x-axis
entete = alldata(1:1,3:end)
ax = gca;
ax.XTickLabel = entete;
ax.XTickLabelRotation = -45;

But it is not working, any idea ?
Thanks !

alldata :
'Genotype' 'Number' 'Clic' '2kHz' '4kHz' '8kHz' '16kHz' '24kHz' '32kHz'
'WT' [ 453] [ 5] [ 70] [ 45] [ 20] [ 20] [ 70] [ 65]
'WT' [ 468] [ 0] [ 70] [ 35] [ 10] [ 20] [ 50] [ 65]
'WT' [ 466] [ 5] [ 50] [ 35] [ 15] [ 20] [ 40] [ 60]
'WT' [ 452] [ 5] [ 65] [ 40] [ 25] [ 35] [ 75] [ 70]
'WT' [ 470] [ 0] [ 60] [ 25] [ 10] [ 20] [ 35] [ 55]
'WT' [ 467] [ 0] [ 55] [ 40] [ 10] [ 15] [ 35] [ 60]
'WT' [ 456] [ 0] [ 65] [ 40] [ 10] [ 25] [ 70] [ 60]
'MDX' [ 455] [ 0] [ 40] [ 30] [ 0] [ 5] [ 70] [ 55]
'MDX' [ 473] [ 0] [ 50] [ 35] [ 5] [ 20] [ 45] [ 55]
'MDX' [ 472] [ 0] [ 65] [ 35] [ 5] [ 25] [ 50] [ 60]
'MDX' [ 465] [ 0] [ 50] [ 35] [ 10] [ 30] [ 70] [ 65]
'MDX' [ 469] [ 0] [ 65] [ 55] [ 15] [ 20] [ 45] [ 65]
'MDX' [ 471] [ 30] [ 75] [ 50] [ 50] [ 45] [ 80] [ 80]
'MDX' [ 464] [ 0] [ 50] [ 30] [ 10] [ 10] [ 45] [ 60]

ndata_wt :
5 70 45 20 20 70 65
0 70 35 10 20 50 65
5 50 35 15 20 40 60
5 65 40 25 35 75 70
0 60 25 10 20 35 55
0 55 40 10 15 35 60
0 65 40 10 25 70 60

Rob Campbell

Rob Campbell (view profile)

You could try the rotate tick label function here on the FEX. It's ID #8722

Roy Granit

Hi Rob,

Great function!

I just have a problem with the X-Axis labels, I cannot get them to be vertical - any suggestions?

Thanks,
Roy

Rob Campbell

Rob Campbell (view profile)

F4b,

Yes this is possible, but there are two caveats:

1. You first need a criterion of your own for determining what defines an outlier.

2. You have to do a *little* extra coding to highlight the points. Here is an example where we highlight the most positive point only. This is the case because I deliberately have avoided adding too many extra features such as this into the function. The idea is to make it easy for others to modify the plots as needed.

Here is a toy example:

clf, H=notBoxPlot(randn(40,5));
hold on

for ii=1:5
y=get(H(ii).data,'YData');
x=get(H(ii).data,'XData');
f=find(y==max(y));
plot(x(f),y(f),'or','markersize',10)
end

hold off

F4b

F4b (view profile)

F4b

F4b (view profile)

Hello,
very nice function!

I was wondering if it would be possible to show a label for each point in order to identify which points are outliers.

Thank you very much,
F4b

Rob Campbell

Rob Campbell (view profile)

Mathew, what you're asking for is already possible with just one more line of code:

clf,H=notBoxPlot(rand([30,3])); set(H(2).data,'markerfacecolor','g')
set(H(3).data,'markerfacecolor','c')
legend([H.data],'A','B','C') %add legend
set(gca,'XTickLabel',{'A','B','C'})

Matthew

This is a great function, but would be even more great if it displayed the legend so that if you plot two groups, the legend corresponds automatically to the markerfacecolor set in the function.

Rob Campbell

Rob Campbell (view profile)

Example 3.

Chris

Chris (view profile)

Hi, and I like this, we have data with an N of 300, and would like to reduce the size of the dots, as they obscure the mean/SE/SD 'patch'

thanks
chris

Erik

Erik (view profile)

Tom

Tom (view profile)

Rob Campbell

Rob Campbell (view profile)

Yes, a box-plot shows the median and quartiles, etc, so can be asymmetric and what not. If that's what you want, then use the MATLAB boxplot function. This version is, as the name says, /not/, a box plot. It uses the mean and statistics relating to the mean. These produce symmetrical error bars. There is rationale to this and, TBH, this function is aimed more at replacing bar charts than at replacing box plots.

The rationale is that t-tests and ANOVA are often performed on data which are typically plotted as bar charts and sometimes box-plots. However, tests are based upon the mean, yet box-plots show the median. Bar charts are often found supplemented with errors bars displaying 1 standard error of the mean (1 SEM), which does not reflect the p=0.05 significance criterion often used in biology and the social sciences. The 95% confidence interval used here provides a visual indicator of significance. In most bar charts the raw data are not overlaid, which greatly reduces the utility of the plot as it hides the underlying data. Yet with carefully chosen plot options, which is facilitated by this function, it's often possible to plot all the raw data even for large numbers of groups. I believe that overlaid raw data are usually more informative than quartiles and whiskers of a box plot. Of course that's a personal preference.

arnold

arnold (view profile)

I might be mistaken, but isn't the line in a boxplot supposed to be the median?

I tried this with my data and the box is always symmetrical, whereas using the matlab boxplot function one can see how unevently distributed the data is (well, one can see that from the single datapoints plotted by 'notboxplot' too).

is there a way to make the box behave "normally"?

I like it

Rob Campbell

Rob Campbell (view profile)

I see: instead of x being a vector (of numbers). That would seem like an inuitive extension. I'll do it when I get a moment.

Adam

Adam (view profile)

Very nice. Thank you! I'd like to also request that the second argument could be a list of strings the define the groups.

e.g.

notBoxPlot(data, grouplabels)

Rob Campbell

Rob Campbell (view profile)

Gavin,
Thanks for your bug report. I have submitted a fix that correctly parses:
notBoxPlot(randn(1,100),repmat(1:10,1,10),0.1,'line')

Gavin

Gavin (view profile)

If one uses vectors for the y and x inputs, the jitter and style options don't work. This is because the recursive call to notBoxPlot on line 115 doesn't pass those options through.

Jessica

I am very excited to use this plotting tool but I'm having an issue. When I try to run the notboxplot code I get the following error.
"??? Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit. Be aware that exceeding your available stack space can crash MATLAB and/or your computer.

Error in ==> findobjhelper"

What should I set my recursion limit to so that the code works but I do not crash my computer? Or is there something else wrong?

Thanks,
Jessica

Rob Campbell

Rob Campbell (view profile)

Julia,

The notBoxPlot function returns the handles of the plotted data. It's probably best to use these to do what you want. e.g:

H=notBoxPlot(randn(20,2));
x1=get(H(1).data,'XData');

x1 are the x values of the points in the first box. You can use this approach to get all the x and y data and then plot the lines. You can alter the order of the plot elements on the screen like this: http://www.matlab-cookbook.com/recipes/0050_Plotting/0010_Plot_Manipulation/changingPlotOrder.html

My only note of caution is that the plot may look messy because of the jitter along the x axis. You can modify the jitter with the 3rd input argument. If you have many data points then what you're doing may work better as a scatter plot. Perhaps my rug plot command would be of interest? http://www.mathworks.com/matlabcentral/fileexchange/27582-rug-plots

Great function, was looking for a way to plot my data points on my box-and-whisker plot and this seems to do the trick.

Was wondering if there were any suggestions on drawing correlating lines between data points and data sets. For example, I have a bunch of data points BEFORE an event for a collection of subjects and then bunch of data points taken AFTER an event for the same subjects. I would like to plot the two sets next to each other using this function and then have lines going from subject 1, before to subject 1, after and subject 2, before to subject, after, etc.

Any suggestions?

Rob Campbell

Rob Campbell (view profile)

JG:
Q1. The function will return the coordinates of the means so you can use these with polyval. e.g.
H=notBoxPlot(randn(10),[],[],'line');
x=get([H.mu],'XData'), y=get([H.mu],'YData');
Without "line" the above will return two data points for each mean (since the means are lines), but it's easy enough to work with that too. Does that work for you?

Q2. You can do this as follows:
notBoxPlot(randn(10,5),[1,2,5,9,10])

J G

J G (view profile)

How can I use this function with continuous spacing on x-axis?
For example,
p = [0.1 0.25 0.5 0.75 0.9];
boxplot(A,'position',p)
will place the boxplots unevenly spaced along x-axis. Is there a way to do this with this function?

J G

J G (view profile)

Great function! Is there a way to have a trend line through the means, i.e. using polyval/polyfit? Thanks!

Rob Campbell

Rob Campbell (view profile)

Ok... For some reason adding a patch object causes gname to fail. If you run notBoxPlot using the "line" plotting style then gname works.

Rob Campbell

Rob Campbell (view profile)

Hmmm... Don't know. I will look into it.

Ian Shapiro

Great tool. It's an excellent way to visualize the distribution in a set of data. However, I've found that it does not appear work with 'gname' for labeling individual data points, whereas boxplot is able to do this. Any idea why that's the case?

Rob Campbell

Rob Campbell (view profile)

I will soon be modifying this function to require no additional toolboxes. Otherwise, which function is best probably depends on the size of the data set. For large sample sizes the violin plots work best. For small sample sizes I prefer the plot style on this page, since it doesn't bin the data.

Alexander

In opinion, a better replacement for the builtin boxplot is "Violin Plots for plotting multiple distributions (distributionPlot.m)" which does no require any additional toolboxes. Check:
http://www.mathworks.com/matlabcentral/fileexchange/23661-violin-plots-for-plotting-multiple-distributions-distributionplot-m

ted p teng

ted p teng (view profile)

At the moment, I am admiring what I just made with your function. Love it, thank you.
You guys may also want to use this function in conjunction with XTICKLABEL_ROTATE.

Rob Campbell

Rob Campbell (view profile)

Almost the same way: just don't code your groups as a cell array of strings. To modify your example:

group = [repmat(1, 5, 1); repmat(2, 10, 1); repmat(3, 15, 1)];
notBoxPlot([x;y;z], group)

You can then change the XTickLabels to strings if needed. I've not found I do this often enough to add cell arrays as an input possibility. Perhaps I should, though (when time allows!).

Kelvin

Kelvin (view profile)

I’m wondering how noBoxPlot can plot vectors of different lengths.

i.e.
x = rand(5,1);
y = rand(10,1);
z = rand(15,1);
group = [repmat({'First'}, 5, 1); repmat({'Second'}, 10, 1); repmat({'Third'}, 15, 1)];
boxplot([x;y;z], group)

Thanks in advance!

Thanks Rob. Love it.

Andrea

Andrea (view profile)

Rob Campbell

Rob Campbell (view profile)

Normally I'd say you should modify the plotted objects with the handles returned by the function. However, it would be awkward to do what you requested in this way. Consequently I've just submitted an update which should do what you want. The 4th argument can how have the values "sdline." If you want to alter the line properties, I recommend doing so by modifying the object properties via the handle returned by the function.

J G

J G (view profile)

This is very useful thanks! Is it possible to plot the SD as error bars instead of the box?

Dylan

Dylan (view profile)

Very useful, code is very well written.

Rob Campbell

Rob Campbell (view profile)

Mahmoud,
You can achieve these things in exactly the same way as you would for most other plotting commands. I try to avoid having functions behave too idiosyncratically. So, to answer your question:
clf
h=notBoxPlot(randn(10,2));
set(gca,'XTickLabel',{'GrpA','GrpB'})
ylim([-5,5])

The last two lines are obviously standard ways of setting labels and changing the axis limits. These work with any plot. Note that the notBoxPlot function returns the handles to the plot objects so that you can change their properties or even delete them. For example, you could remove all the data points by doing: delete([h.data])

Mahmoud

Very Useful!
Two questions,
1) How do you add labels to the x-axis like you would with the 'label' option in the boxplot function?
2) How can you specify what range should be plotted on the y-axis of notboxPlot?

Very nice and useful. Thanks!

Rob Campbell

Rob Campbell (view profile)

Really? I thought I zipped it in there. Thanks for letting me know. I shall re-upload.

Michael Ashby

I like the idea but it seems to be missing the required "SEM_calc" function.

Updates

1.30

Some operations (such as t-interval calculation) depend on the Stats Toolbox.

1.30

Accepts LinearModel objects as a y variable; Accepts Table objects as a y variable; Examples are now in standalone files; Legacy call format removed; x variable does not need to be defined as empty when it isn't being used

1.22

provide a nicer example image

1.21

update notes

1.2

rename title

1.2

change repo URL

1.2

move to GitHub

1.15

Verified to work with R2015a

1.14

Update summary.

1.13

Fixed bug that didn't pass input arguments correctly when two vectors are supplied.

1.11

Update summary to explain that the function works without the stats toolbox if the nan-toolbox is installed.

1.10

Better handles x-ticks and x axis limits. Add missing semicolon.

1.9

Fix bug that was causing handles not return for one of the plot formats.

1.8

Both x and y can now be vectors, in which case the function behaves like Mathworks' boxplot. An example of this behaviour is provided.

An example of the "sdline" plot style is now provided.

1.7

If "y" is a vector then the function ensures it is a column vector in order to yield one box-plot.

1.6

The 4th argument can now also have the value "sdline". This creates plots where the SD is a line instead of a patch.

1.5

Handle to mean line when in patch mode (the default mode) is now returned.

1.4

Add link to JCB article on error bars.

1.3

Add tInterval_Calc and update the comments in SEM_calc

1.2

Clarify a point in the description.

1.1

re-upload because support file (SEM_calc.m) seemed to be missing

MATLAB Release
MATLAB 9.1 (R2016b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video