Code covered by the BSD License

### Highlights from Violin Plots for plotting multiple distributions (distributionPlot.m)

4.95833
5.0 | 26 ratings Rate this file 101 Downloads (last 30 days) File Size: 20 KB File ID: #23661 Version: 1.14

# Violin Plots for plotting multiple distributions (distributionPlot.m)

by

### Jonas (view profile)

13 Apr 2009 (Updated )

Function for plotting multiple histograms side-by-side in 2D - better than boxplot.

### Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

File Information
Description

The zip-file contains the following files for visualizing distributions:

- distributionPlot.m: main function that allows creating violin plots

- histogram.m: generate histograms with 'ideal' bin width given the number of data points and the spread (Freedman-Diaconis rule). Note that for integer-valued data, each integer gets its own bin.

In addition, the zip file contains four helper functions: countEntries, colorCode2rgb, isEven, myErrorbar

If you want to overlay individual data points, you need to download the separate submission plotSpread (http://www.mathworks.com/matlabcentral/fileexchange/37105).

DistributionPlot allows visualizing multiple distributions side by side. It is useful for skewed unimodal data and indispensable for multimodal data. DistributionPlot is especially useful for showing the time evolution of a distribution.

Some of the examples from the help:

r = rand(1000,1);
rn = randn(1000,1)*0.38+0.5;
rn2 = [randn(500,1)*0.1+0.27;randn(500,1)*0.1+0.73];
rn2=min(rn2,1);rn2=max(rn2,0);
figure
ah(1)=subplot(2,4,1:2);
boxplot([r,rn,rn2])
ah(2)=subplot(2,4,3:4);
distributionPlot([r,rn,rn2],'histOpt',2); % histOpt=2 works better for uniform distributions than the default
set(ah,'ylim',[-1 2])
data = [randn(100,1);randn(50,1)+4;randn(25,1)+8];
subplot(2,4,5)
distributionPlot(data); % defaults
subplot(2,4,6)
distributionPlot(data,'colormap',copper,'showMM',5,'variableWidth',false) % show density via custom colormap only, show mean/std,
subplot(2,4,7:8)
distributionPlot({data(1:5:end),repmat(data,2,1)},'addSpread',true,'showMM',false,'histOpt',2) %auto-binwidth depends on # of datapoints; for small n, plotting the data is useful

Acknowledgements

Plot Spread Points (Beeswarm Plot) inspired this file.

This file inspired Violin Plot.

MATLAB release MATLAB 7.6 (R2008a)
Other requirements The 'smooth' option of histogram.m requires the spline toolbox. However, for smooth histograms ksdensity is probably the better choice, anyway. Grouped data requires the statistics toolbox.
25 May 2016 Anne Urai

### Anne Urai (view profile)

16 May 2016 Shilo

### Shilo (view profile)

Great, Thanks, very useful!
Is there an option to use the addSpread function and color the dots using different values- so adding another dimension to the data?

Comment only
12 May 2016 Isobel

### Isobel (view profile)

This is great, thanks. However, would you consider adding an option to cut plots off in the y-direction at the min and max of the dataset?

01 Apr 2016 Markus Millinger

### Markus Millinger (view profile)

This is very nice! However, the function histogram clashes with the "new" Matlab function with the same name.

Comment only
16 Sep 2015 Amir

### Amir (view profile)

Neat and nice. Much better than the box-plot for scientific work

29 May 2015 Martin Sundqvist

### Martin Sundqvist (view profile)

18 May 2015 Tiago

### Tiago (view profile)

14 May 2014 Johann

### Johann (view profile)

30 Apr 2014 Edgar Guevara

### Edgar Guevara (view profile)

Displaying distributional differences provide more information of the samples and are very useful when distance from zero is meaningless.
Furthermore, the option to overlay the mean, SEM, sd and percentiles helps us better interpret the statistical analyses.
Overall, an invaluable option to the classic barplots and boxplots.

21 Jan 2014 Holger Hoffmann

### Holger Hoffmann (view profile)

Excellent, just what I needed. It served me very well.

I added a modified version to the MatLabFEx using the smooth kernel density (Violin Plot based on kernel density estimation).

17 Mar 2013 Jonas

### Jonas (view profile)

@Warwick: this looks like a bug - globalNorm=2 should do the trick, but at the moment, it seems like it would require equally spaced bins. I'll look into it.

Comment only
17 Mar 2013 Warwick

### Warwick (view profile)

This is a great function. However I want to discriminate between two quite different distributions. I have a problem getting the Total area under the respective curves to be equal (to a nominal 1) for separate datasets (even with the same number of observations). Eg, Say I want to plot U and V left and right respectively where
U = normrnd(3.3,1.0,100,1);
V = normrnd(2.0,0.3,100,1);

then no matter what I do, they don't look anywhere near equal. Any ideas? or have I missed something obvious?

04 Oct 2012 Sturla Kvamsdal

### Sturla Kvamsdal (view profile)

15 Jun 2012 Dan K

### Dan K (view profile)

This is a great tool... It would be nice if some of the functionality could be achieved without requiring toolboxes (e.g. I've cobbled together the code to do the smoothed histograms without the spline toolbox, using files from FEX).

14 Jun 2012 Jonas

### Jonas (view profile)

@all: thanks again for the suggestions, most of which are implemented now. Please note that plotSpread is now a submission on its own that needs to be downloaded separately.

Comment only
13 Apr 2012 Andres

### Andres (view profile)

Very, very useful!

19 Mar 2012 Jonas

### Jonas (view profile)

@Yuri Kotliarov: I suggest you call addSpread.m directly, rather than via distributionPlot.m

@all: thanks for the good suggestions. I hope I can implement them soon!

Comment only
19 Mar 2012 Yuri K

### Yuri K (view profile)

@Jonas, I didn't find if there is a way to change the width of dots spread (addSpread is 1). It doesn't seem to depend on distWidth. If I don't show the density (color is white), the distance between groups is quite large. Thanks.

Comment only
19 Mar 2012 Kelly Kearney

### Kelly Kearney (view profile)

Overall, this is a great function, and I use it quite often to analyze model ensemble output. A few enhancements that could be nice:

- Add the option to display in a horizontal orientation.

- Add the option to filter outliers when calculating bin widths and kernal densities. Could also be nice to display these as points, as in boxplot, rather than connecting them via long lines to the main histogram.

- This is an edge case, but the function will error under the addSpread option if a column/group contains only NaNs and/or Infs.

01 Mar 2012 Warwick

### Warwick (view profile)

This is very good. I've just included some plots in a report. Thank you. Possibly you could add an extra feature within the options of 'showMM' = 6, say, which would be to draw a horizontal line of linewidth 2 for the median, and 25 & 75 pctiles at linewidth 1.

14 Dec 2011 Jonas

### Jonas (view profile)

@Yuri: I have implemented your suggestion (though I start the histograms from the very left or right side, respectively), and fixed the previous bug.

Comment only
16 Nov 2011 Yuri K

### Yuri K (view profile)

@Jonas: Thanks for the answer. May I suggest a new feature? It would be nice to draw histogram at certain direction. Currently it's only centered, but also can be left- or right- directed. All you need to change is xBase variable at line 401: 0.5 to 0 for left direction, -0.5 to 0 for right direction. For someone it's easier to understand when the distributions looks like turned histograms.

Comment only
01 Nov 2011 Jonas

### Jonas (view profile)

@Yuri Kotliarov: Currently, the only workaround is to call ksdensity outside of distributionPlot to ensure that the smoothing uses the same kernel:

x = zeros(10,1);
y = x+randn(10,1)*0.1;
[yy(:,2),yy(:,1)] = ksdensity(y,'width',0.01);
[xx(:,2),xx(:,1)] = ksdensity(x,'width',0.01);
distributionPlot({xx,yy},'showMM',false)

Unfortunately, the showMM option is bugged when you supply your own histograms at the moment, so you have to set that option to false.

Comment only
31 Oct 2011 Yuri K

### Yuri K (view profile)

@Jonas: I have problem with smoothing (histOpt=1) when all values for a group are the same. In this case the distribution plot is very wide comparing to the same data with a little variance.
For example:
x = zeros(10,1);
y = x+randn(10,1)*0.1;

The same happens with a few outliers in x. I understand it's probably how ksdensity function works. But can you do anything to make the above cases comparable?

Comment only
07 Jul 2011 Alexander

### Alexander (view profile)

21 Jun 2011 Jonas

### Jonas (view profile)

@Yuri: The new version of distributionPlot supports grouped data.

Comment only
24 Jan 2011 Yuri K

### Yuri K (view profile)

Great! Thanks.

20 Jan 2011 Jonas

### Jonas (view profile)

@Yuri: No, it doesn't work with grouped data (yet). In the meantime, you can use a function like group2cell (http://www.mathworks.com/matlabcentral/fileexchange/11192-group2cell) to distribute your grouped data among cells to use with distributionPlot.

Comment only
20 Jan 2011 Jonas

### Jonas (view profile)

@Brian: Thanks for the suggestions, and for sending me your sample code. I have not had time yet to update my code, though, but I will look into it!

Comment only
15 Dec 2010 Yuri K

### Yuri K (view profile)

Does it work with grouped data, like boxplot does?

Comment only
23 Sep 2010 Brian Katz

### Brian Katz (view profile)

This works quite well, giving a very interesting data presentation method. Some improvements could be the use of a colormap, rather than a fored gray scale. An example in teh help would also be a good addition.
I have started to try and make a combined plot which allows for both boxplot (using boxplotCsub) and distributionPlot. As both are symetrical, they can both be collapsed to one-sided and then combing, giving two very interesting looks at the same data sets.

28 Aug 2010 Brian Katz

### Brian Katz (view profile)

Very very cool.

21 Jul 2010 Andrei Bejan

### Andrei Bejan (view profile)

12 May 2010 Denzel Li

### Denzel Li (view profile)

11 May 2010 Rob Campbell

### Rob Campbell (view profile)

21 Nov 2009 William Irwin

### William Irwin (view profile)

09 Sep 2009 Chris Lydick

### Chris Lydick (view profile)

25 Jun 2009 Oleg Komarov

### Oleg Komarov (view profile)

28 Apr 2009 Chiara

### Chiara (view profile)

19 Apr 2009 Christopher

### Christopher (view profile)

16 Apr 2009 1.1

Fixed cryptic error if the data was all NaNs (thanks Christopher for pointing it out!).
distributionPlot now also automatically converts arrays in cells to vectors and throws a warning.

25 Apr 2009 1.2

Documented previously undocumented functionality, chose better screenshot to demonstrate how distributionPlot is better for comparing distributions than boxplot

20 Jan 2011 1.3

Updated title to Violin Plot, because that's how (part) of these plots are called elsewhere.

20 Jun 2011 1.4

Changed input from optional arguments to parameterName/parameterValue pairs (note that the old syntax still works!).
Added several new features, such as support for grouped variables, overlay of data points, and user-defined colormaps.

20 Jun 2011 1.6

Made colorbar more meaningful if there is only one colormap and the bins are normalized globally (i.e. globalNorm is set to 1). Thanks to Brian Katz for the suggestion.

21 Jun 2011 1.7

Fixed a bug in the code, and two mistakes in the example.

02 Oct 2011 1.9

Improved normalization options. Thanks to Jake for the suggestion.

14 Dec 2011 1.12

Added option to align the bars at the left or the right (option "histOri"), as suggested by Yuri. Also, bugfix.

12 Jun 2012 1.13