An alternative implementation of Matlab's boxplot function, with a slightly different interface. The function allows boxes to be arbitrarily spaced, sub-grouped data to be plotted, and various display parameters to be modified. No toolboxes required.
Note: requires quantile2.m from http://www.mathworks.co.uk/matlabcentral/fileexchange/46555-quantile2-m
MATLAB release
MATLAB 8.0 (R2012b)
MATLAB Search Path
/
Other requirements
Requires quantile2.m from http://www.mathworks.co.uk/matlabcentral/fileexchange/46555-quantile2-m
I see you're still busy adding features and bugfixing, well, I can recommend two or three more things. Don't take it the wrong way, I'm just suggesting what I edit in my plots and what my experience with catchy scientific plotting tells me.
- background scatter plot (with jitter in x) behind each box: it's sometimes nice to see the actual data points too, many modern tools like Origin and JMP offer this option. If you just overlay a scatterplot (randomize x-position according to the width of the boxes for nicer look instead of strict x-position and set color according to box group) it achieves just that. I did it manually in some scripts now, and it looks great.
- An option to display the number of data points which make up the box on the top right (i.e.) of each box. the the prior suggestion, this helps to understand the quality of the statistics behind the boxes since many people ignorantly use box-plots with far too little data.
We've done this in our institute for years and I think it enhances the speed & quality of data interpretation.
- when the x-groups are non-scalar, vertical separator lines between the groups usually improve readability
- the ultimate box-plotting tool would be the combination of your contribution here with "hierarchical box plot" (also found on fileexchange).
... as I said, I don't say you have to include any of this ;)
Thanks again for your feedback and suggestions, Arnold. At your suggestion I've added an xSpacing option, that I hope is satisfying. I've also fixed both functions to remove NaNs in x and corresponding y data. Again, I hope that fixes the problem. As for logarithmic spacing, there is undoubtedly a solution, but I think it will make things unnecessarily complicated. Instead, I suggest using the new xSpacing option. The former boxWidthMode evolved into the boxSpacing option, and would not have been helpful in this case.
Some more suggestions. I miss the option to use an X vector with numbers for labelling but NOT spacing the axis accordingly. When X contains strings it goes and just evenly spaces the categories. It would be nice if evenly spaced x-ticks/groups would be possible for numbered X vectors as well. Maybe you just go and introduce the option 'xSpacing', 'scalar' or 'even'
Regarding this, I also found another bug. When X contains numbers AND nan's (which happens to some of my datasets), line 390 in box_plot throws an error. Adding the line x(isnan(x)) = [ ] fixes it. I'm not sure if it's generally applicable.
A minor thing... When using scalar x-axis-spacing with a numbered X-vector, the box widths get messed up when applying a logarithmic scale. You did have a 'boxwidthmode' which is no longer supported, did that have something to do with it?
I have no elegant solution for this in mind.
Ah. Of course the example works for me! There was a bug in quantile2 that I fixed a while back, but forgot to upload the file to the FX. Please download the most recent version:
Interesting function of yours (tab2box). Looks like that, combined with box_plot.m could be what I've been looking for ... but for me (R2014b) the example given by you in the file itself (line 35-57) doesn't even work. Throws the same error as it does for my own data table:
Attempted to access x2(1); index out of bounds because numel(x2)=0.
Error in quantile2 (line 156)
q(m,n) = x2(1);
Error in box_plot (line 266)
Z.median = quantile2(y,.5,[],options.method); % median
Thanks for your feedback and comment. Just to make sure I understand correctly, you're suggesting that the Y input should facilitate plotting samples of different sizes? So the columns could be of different lengths? What if Y could be a cell array?
The tab2box function implicitly offers this functionality, since it will pad Y with NaN when samples are unequal in size. But this relies on the data being in tabular form initially.
if you used structures, you could have different x-values and sizes for each set. This would make a great addition, since Matlab and other submissions here haven't supported this forver.
or you could just introduce a group vector. X an ix1 vextor for the x-positions. Y an m x i matrix for the data and g a i x 1 vector for the groups....
great function, thx.
It would be great if you added the ability to use different sizes for the groups which would not have to be integrated into one 3d matrix.
@Alberto you mean the box colour? Use the 'boxColor' option and set it, for example, to [1 1 1; .5 .5 .5] (assuming you have two boxes per y-tick). Setting parameters for each group is described towards the bottom of the help text.
Nice function, it works as promise. I miss the option , (or i don't know how to do it ) to fill the notch with a specified color like on the figure example.
Updates
09 May 2014
1.1
Changed/corrected quantile estimation algorithm. Details in help text.
12 May 2014
1.2
Moved quantile calculation to new function.
14 May 2014
1.3
Function now natively supports sub-groups, handles NaNs more robustly, and returns sample size(s). A few other minor tweaks and doc changes.
23 Apr 2015
1.4
Improved robustness to small or empty samples. Add tab2box function for arranging tabular into the format accepted by box_plot.
18 May 2015
1.4.1
Updated documentation.
19 May 2015
1.5
Remove NaN from x data (and corresponding y data).