Code covered by the BSD License  

Highlights from
Violin Plots for plotting multiple distributions (distributionPlot.m)

5.0

5.0 | 15 ratings Rate this file 56 Downloads (last 30 days) File Size: 19.53 KB File ID: #23661
image thumbnail

Violin Plots for plotting multiple distributions (distributionPlot.m)

by Jonas

 

13 Apr 2009 (Updated 14 Dec 2011)

Function for plotting multiple histograms side-by-side in 2D - better than boxplot.

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

| Watch this File

File Information
Description

The zip-file contains the following files for visualizing distributions:

- distributionPlot.m: main function that allows creating violin plots

- histogram.m: generate histograms with 'ideal' bin width given the number of data points and the spread (Freedman-Diaconis rule). Note that for integer-valued data, each integer gets its own bin.

- plotSpread.m: plot point clouds with no overlap. Works well for small number of data points, i.e. when there are less than ~20 values per bin.

In addition, the zip file contains four helper functions: countEntries, colorCode2rgb, isEven, myErrorbar

DistributionPlot allows visualizing multiple distributions side by side. It is useful for skewed unimodal data and indispensable for multimodal data. DistributionPlot is especially useful for showing the time evolution of a distribution.

Some of the examples from the help:

        r = rand(1000,1);
        rn = randn(1000,1)*0.38+0.5;
        rn2 = [randn(500,1)*0.1+0.27;randn(500,1)*0.1+0.73];
        rn2=min(rn2,1);rn2=max(rn2,0);
        figure
        ah(1)=subplot(2,4,1:2);
        boxplot([r,rn,rn2])
        ah(2)=subplot(2,4,3:4);
        distributionPlot([r,rn,rn2],'histOpt',2); % histOpt=2 works better for uniform distributions than the default
        set(ah,'ylim',[-1 2])
        %--additional options
        data = [randn(100,1);randn(50,1)+4;randn(25,1)+8];
        subplot(2,4,5)
        distributionPlot(data); % defaults
        subplot(2,4,6)
        distributionPlot(data,'colormap',copper,'showMM',5,'variableWidth',false) % show density via custom colormap only, show mean/std,
        subplot(2,4,7:8)
        distributionPlot({data(1:5:end),repmat(data,2,1)},'addSpread',true,'showMM',false,'histOpt',2) %auto-binwidth depends on # of datapoints; for small n, plotting the data is useful

MATLAB release MATLAB 7.6 (R2008a)
Other requirements The 'smooth' option of histogram.m requires the spline toolbox. However, for smooth histograms ksdensity is probably the better choice, anyway. Grouped data requires the statistics toolbox.
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (25)
19 Apr 2009 Christopher  
28 Apr 2009 Chiara  
25 Jun 2009 Oleg Komarov  
09 Sep 2009 Chris Lydick  
21 Nov 2009 William Irwin  
11 May 2010 Rob Campbell  
12 May 2010 Denzel Li  
21 Jul 2010 Andrei Bejan  
28 Aug 2010 Brian Katz

Very very cool.

23 Sep 2010 Brian Katz

This works quite well, giving a very interesting data presentation method. Some improvements could be the use of a colormap, rather than a fored gray scale. An example in teh help would also be a good addition.
I have started to try and make a combined plot which allows for both boxplot (using boxplotCsub) and distributionPlot. As both are symetrical, they can both be collapsed to one-sided and then combing, giving two very interesting looks at the same data sets.

15 Dec 2010 Yuri Kotliarov

Does it work with grouped data, like boxplot does?

20 Jan 2011 Jonas

@Brian: Thanks for the suggestions, and for sending me your sample code. I have not had time yet to update my code, though, but I will look into it!

20 Jan 2011 Jonas

@Yuri: No, it doesn't work with grouped data (yet). In the meantime, you can use a function like group2cell (http://www.mathworks.com/matlabcentral/fileexchange/11192-group2cell) to distribute your grouped data among cells to use with distributionPlot.

24 Jan 2011 Yuri Kotliarov

Great! Thanks.

21 Jun 2011 Jonas

@Yuri: The new version of distributionPlot supports grouped data.

07 Jul 2011 Alexander  
31 Oct 2011 Yuri Kotliarov

@Jonas: I have problem with smoothing (histOpt=1) when all values for a group are the same. In this case the distribution plot is very wide comparing to the same data with a little variance.
For example:
x = zeros(10,1);
y = x+randn(10,1)*0.1;
distributionPlot({x,y},'histOpt',1,'addSpread',1)

The same happens with a few outliers in x. I understand it's probably how ksdensity function works. But can you do anything to make the above cases comparable?

01 Nov 2011 Jonas

@Yuri Kotliarov: Currently, the only workaround is to call ksdensity outside of distributionPlot to ensure that the smoothing uses the same kernel:

x = zeros(10,1);
y = x+randn(10,1)*0.1;
[yy(:,2),yy(:,1)] = ksdensity(y,'width',0.01);
[xx(:,2),xx(:,1)] = ksdensity(x,'width',0.01);
distributionPlot({xx,yy},'showMM',false)

Unfortunately, the showMM option is bugged when you supply your own histograms at the moment, so you have to set that option to false.

16 Nov 2011 Yuri Kotliarov

@Jonas: Thanks for the answer. May I suggest a new feature? It would be nice to draw histogram at certain direction. Currently it's only centered, but also can be left- or right- directed. All you need to change is xBase variable at line 401: 0.5 to 0 for left direction, -0.5 to 0 for right direction. For someone it's easier to understand when the distributions looks like turned histograms.

14 Dec 2011 Jonas

@Yuri: I have implemented your suggestion (though I start the histograms from the very left or right side, respectively), and fixed the previous bug.

01 Mar 2012 Warwick

This is very good. I've just included some plots in a report. Thank you. Possibly you could add an extra feature within the options of 'showMM' = 6, say, which would be to draw a horizontal line of linewidth 2 for the median, and 25 & 75 pctiles at linewidth 1.

19 Mar 2012 Kelly Kearney

Overall, this is a great function, and I use it quite often to analyze model ensemble output. A few enhancements that could be nice:

- Add the option to display in a horizontal orientation.

- Add the option to filter outliers when calculating bin widths and kernal densities. Could also be nice to display these as points, as in boxplot, rather than connecting them via long lines to the main histogram.

- This is an edge case, but the function will error under the addSpread option if a column/group contains only NaNs and/or Infs.

19 Mar 2012 Yuri Kotliarov

@Jonas, I didn't find if there is a way to change the width of dots spread (addSpread is 1). It doesn't seem to depend on distWidth. If I don't show the density (color is white), the distance between groups is quite large. Thanks.

19 Mar 2012 Jonas

@Yuri Kotliarov: I suggest you call addSpread.m directly, rather than via distributionPlot.m

@all: thanks for the good suggestions. I hope I can implement them soon!

13 Apr 2012 Andres

Very, very useful!

Please login to add a comment or rating.
Updates
16 Apr 2009

Fixed cryptic error if the data was all NaNs (thanks Christopher for pointing it out!).
distributionPlot now also automatically converts arrays in cells to vectors and throws a warning.

25 Apr 2009

Documented previously undocumented functionality, chose better screenshot to demonstrate how distributionPlot is better for comparing distributions than boxplot

20 Jan 2011

Updated title to Violin Plot, because that's how (part) of these plots are called elsewhere.

20 Jun 2011

Changed input from optional arguments to parameterName/parameterValue pairs (note that the old syntax still works!).
Added several new features, such as support for grouped variables, overlay of data points, and user-defined colormaps.

20 Jun 2011

Made colorbar more meaningful if there is only one colormap and the bins are normalized globally (i.e. globalNorm is set to 1). Thanks to Brian Katz for the suggestion.

21 Jun 2011

Fixed a bug in the code, and two mistakes in the example.

02 Oct 2011

Improved normalization options. Thanks to Jake for the suggestion.

14 Dec 2011

Added option to align the bars at the left or the right (option "histOri"), as suggested by Yuri. Also, bugfix.

Tag Activity for this File
Tag Applied By Date/Time
plotting Jonas 14 Apr 2009 10:11:59
distributions Jonas 14 Apr 2009 10:11:59
histogram Jonas 14 Apr 2009 10:11:59
distributions Florian 14 Oct 2009 12:22:28
histogram Florian 14 Oct 2009 12:24:43
plotting Florian 14 Oct 2009 12:24:56
potw Shari Freedman 10 Jun 2010 10:48:03
distributions Jose Ercolino 11 Jun 2010 09:44:39
histogram Jose Ercolino 11 Jun 2010 09:44:42
pick of the week Jose Ercolino 11 Jun 2010 09:49:05
pick of the week Andrei Bejan 21 Jul 2010 08:33:13
histogram Dan K 14 Oct 2010 14:39:26
plotting James 18 Dec 2010 17:28:43
distributions ning 05 Jan 2012 21:31:24
histogram Andrew 08 Mar 2012 21:31:29

Contact us at files@mathworks.com