Code covered by the BSD License  

Highlights from
cloudPlot

4.92857

4.9 | 14 ratings Rate this file 77 Downloads (last 30 days) File Size: 110 KB File ID: #23238

cloudPlot

by

 

09 Mar 2009 (Updated )

A function to plot the distribution of 2-dimensional data.

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

| Watch this File

File Information
Description

cloudPlot will help visualize the distribution of a 2-dimensional dataset. It is especially helpful when looking at extremely large datasets where a redular plot(x,y,'.') will just fill the plot with a solid color because the measurement points overlap each other.

cloudPlot uses the built-in matlab routines to set the axis limits and grid points appropriate to the data.

MATLAB release MATLAB 8.1 (R2013a)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (26)
02 Apr 2014 Philip

Very nice, the automatic binning to the pixels is very good, and all the argument parsing stuff makes it flexible. You could replace lines 139 and 152-155 with:
canvas = accumarray( [xBinIndex, yBinIndex], 1, bins);
for a bit of speed up (and it isn't reliant on acceleration).

08 Jan 2014 Robert

Wow, that surprises me. In one case I ran, the profiler reported the time spent populating the canvas dropped from 80 seconds to 10 seconds. That was using MATLAB r2012b 64 bit on Windows 7 for creating a cloudPlot from around 10M data points. If I can figure out what the difference was I will let you know.

16 Dec 2013 Daniel Armyr

Robert: Very nice idea. Unfortunately, I do not see that it gives a speedup in any of the cases I have tried. I will therefore say that this is not an improvement and will not update. However, if you can provide me an example where you see significant speedup using the full/sparse trick, then I would be happy to look at it and see how I can get the best performance for all cases.

13 Dec 2013 Robert

Love this function. I haven't worked with sparse matricies much so I can't speak to the robustness of this, but as food for thought:
The counting on lines 152 - 155 can be done more quickly using
canvas = full(sparse(xBinIndex,yBinIndex,1,bins(1),bins(2)));
Anyway, thanks for this submission and for maintaining and updating it.

25 Oct 2013 Felipe G. Nievinski

See also file ID # 29709, 18386, 31889, 29641, 6037, 29435, 25914, 25325, 14205, 13352, 43672, ...

29 Aug 2013 Daniel Armyr

I disagree, but I have given you my contact information so we can take a closer look at it.

28 Aug 2013 Eric Sampson

Handy function Daniel! However, unless I'm missing something I think you need to do a flipud(canvas) before passing it to imagesc?

01 Aug 2013 Jimmy  
25 Sep 2012 Yakir

fast

16 Feb 2012 William Noderer  
09 Jan 2012 Daniel Armyr

Hi.
That is a pretty good idea. I will take it into consideration, but for now, here is a quick hack that will give you what you are looking for.

Use it the same way that you use the original function, only add a final argument (scaleFactor) that tells the function what scale factor to apply. If you want a density function, you set the last argument to 1/numel(X). In order to avoid extensive recoding, you will have to provide all the other arguments in order to use a scaling factor.

Example of density plot of random data
nPoints = 1e5;
clpoudPlot ( randn(nPoints,1), randn(nPoints,1), [], false, [], 1/nPoints );

--

function [ varargout ] = cloudPlot( X, Y, axisLimits, useLogScale, bins, scaleFactor )

% Check the data size
assert ( numel(X) == numel(Y), ...
'The number of elements in X and Y must be the same.' );

if ( nargin >= 3 && ~isempty(axisLimits) )
pointSelect = X<=axisLimits(2) & X>=axisLimits(1) & ...
Y<=axisLimits(4) & Y>=axisLimits(3);
X = X(pointSelect);
Y = Y(pointSelect);
axisLimitsSet = true;
else
axisLimitsSet = false;
end

if ( nargin < 4 || isempty(useLogScale) )
useLogScale = false;
end

if ( nargin < 5 )
bins = [];
end


%Remove any nans or Infs in the data as they have no meaning in this
%context.
pointSelect = ~(isinf(X) | isnan(X) | isinf(Y) | isnan(Y));
X = X(pointSelect);
Y = Y(pointSelect);

% Plot to get appropriate limits
h = [];
if ( axisLimitsSet )
g = gca;
set ( g, 'Xlim', [axisLimits(1) axisLimits(2)] );
set ( g, 'Ylim', [axisLimits(3) axisLimits(4)] );
set ( g, 'units', 'pixels' );
else
h = plot ( X(:), Y(:), '.' );
g = get( h, 'Parent' );
end
xLim = get(g, 'Xlim' );
yLim = get(g, 'Ylim' );

%Get the bin size.
unitType = get(g,'Unit');
set(g,'Unit','Pixels')
axesPos = get(g,'Position');
nHorizontalBins = axesPos(3);
nVerticalBins = axesPos(4);
set(g,'Unit', unitType );

% Clear the data, as we actually don't want to see it.
if ( ~isempty(h) )
set ( h, 'XData', [] );
set ( h, 'YData', [] );
end

% Allocate an area to draw on
if ( isempty(bins) )
bins = ceil([nHorizontalBins nVerticalBins ]);
else
assert ( isnumeric(bins) && isreal(bins) );
assert ( numel(bins) == 2 );
end
binSize(2) = diff(yLim)./(bins(2));
binSize(1) = diff(xLim)./(bins(1));

canvas = zeros(bins);

% Draw in the canvas
xBinIndex = floor((X - xLim(1))/binSize(1))+1;
yBinIndex = floor((Y - yLim(1))/binSize(2))+1;

% Added security: Make sure indexes are never outside canvas. May not be
% possible.
pointsSelect = xBinIndex > 0 & xBinIndex <= bins(1) & ...
yBinIndex > 0 & yBinIndex <= bins(2);
xBinIndex = xBinIndex(pointsSelect);
yBinIndex = yBinIndex(pointsSelect);

for i = 1:numel(xBinIndex);
canvas(xBinIndex(i),yBinIndex(i)) = ...
canvas(xBinIndex(i),yBinIndex(i)) + 1;
end

% Show the canvas and adjust the grids.
if ( useLogScale )
h = imagesc(xLim, yLim,log10(canvas*scaleFactor)');
else
h = imagesc(xLim, yLim, canvas'*scaleFactor);
end

axis ( 'xy' );
axis ( 'tight' );
set ( g, 'units', 'normalized' ); % Enable resizing

% Optionally return a handle.
if ( nargout == 1)
varargout{1} = h;
end

07 Jan 2012 texas A&M universdity

Great Code Daniel.

Can you please suggest how I can obtain probability density instead of histogram in your Code or How I can divide the frequency in all the bins with a constant number.

Thank you

Sireesh Dadi

04 Aug 2011 Daniel Armyr

JG: I your data is 1D, then this function is not for you. Read up on the details of hist() in the documentation and you will find that Hist can do what you are looking for.

03 Aug 2011 J G

This is a very cool function but I need some help... I only have a vector Y with about 1000 elements(values >0 and <1) and I have shown the frequency at which they occur by hist(Y). However, I wondered if there was a way of using this function, showing which regions of the space between 0 and 1 most frequently occur in Y? Thanks

20 May 2011 Daniel  
13 May 2011 Daniel Armyr

Of course you are right. New version with better scaling, support for hold states, and support for manual setting of transparency. See the published demo file for examples.

09 May 2011 Romesh

Thanks for the update Daniel! However, the key change I made was in calling imagesc:

imagesc([axisLimits(1) axisLimits(2)],[axisLimits(3) axisLimits(4)],canvas_display');

If you provide imagesc with the axis limits, then you are able to completely do away with rescaling. Thus you don't need scale factors and you don't need to plot and record the original axes- much simpler!

07 May 2011 Daniel Armyr

OK. Updated with the suggestions above and done some elementary testing so hopefully there are no major bugs.

Romesh: I did a variant on your idea. If axis limits are provided, these are used exactly. If none are provided, the function finds axis that will look pretty, the same way that the plot command does.

04 May 2011 Romesh

Awesome code, I use it all the time and it is fantastic. However, there were a couple of things that are rough around the edges
- Different scaling means the resulting plot has different axis limits so it is hard to overlay data
- Cannot move the plot around using the drag tool
- Sometimes the limits are not what I expect (e.g. there are empty spaces on the plot because the plot is larger than the canvas- I get that this shouldn't happen in theory!)
- Cannot readily modify the number of bins to give a different plot resolution

I've made some changes that I think fix all of these problems (definitely NOT certified bug free though!), and I've also managed to remove the try-catch block because I think I've fixed the error condition that it was catching.

This code is a drop-in replacement for cloudPlot:

function [h,canvas] = cloudPlot(X,Y,axisLimits,useLogScale,bins)
% Display a cloud plot from X and Y data
%
% Only X and Y are mandatory arguments
% useLogScale = 1 to take the log of the data (to visalise data where
% there are large variations in density)
% axisLimits = [x_min, x_max, y_min, y_max], [] to omit
% bins = [number_x_bins, number_y_bins] (defines resolution of output)
%
% e.g. cloudPlot(x,y,[],1,[500 500]) to produce a cloudPlot with
% 500x500 bins, displaying the log of the density with automatic
% axis limits
%
% By Romesh Abeysuriya, based heavily on cloudPlot.m by Daniel Armyr
%
% (from Daniel Armyr's cloudPlot.m):
% A cloudplot is in essence a 2 dimensional histogram showing the
% density distribution of the data described by X and Y.
% As the plot displays density information, the
% dimensionality of X and Y are ignored. The only requirement is that X and
% Y have the same number of elements. Cloudplot is written to visualize
% large quantities of data and is most appropriate for datasets of 10000
% elements or more.

% Check the data size
assert (numel(X) == numel(Y),'The number of elements in X and Y must be the same');
if nargin < 4; useLogScale = false; end
if nargin < 3 | isempty(axisLimits) % Calculate default axis limits
axisLimits = [min(X(:)) max(X(:)), min(Y(:)) max(Y(:))] ;
end

%Remove any nans or Infs in the data
pointSelect = isfinite(X) & isfinite(Y) & X >= axisLimits(1) & X <= axisLimits(2) & Y >= axisLimits(3) & Y <= axisLimits(4);
X = X(pointSelect);
Y = Y(pointSelect);

g = axes('Xlim',[axisLimits(1) axisLimits(2)],'Ylim',[axisLimits(3) axisLimits(4)],'units','pixels');
if nargin < 5 % Calculate the default number of pixels
plotsize = get(g,'Position');
bins = ceil(plotsize(3:4));
end
binSize(2) = diff(axisLimits(3:4))./(bins(2)-1);
binSize(1) = diff(axisLimits(1:2))./(bins(1)-1);
canvas = zeros(bins);

% Calculate bin indices for each of the points in the data arrays
xBinIndex = floor((X - axisLimits(1))/binSize(1))+1;
yBinIndex = floor((Y - axisLimits(3))/binSize(2))+1;

% Collect the indices
for i = 1:numel(xBinIndex);
canvas(xBinIndex(i),yBinIndex(i)) = canvas(xBinIndex(i),yBinIndex(i)) + 1;
end

% Postprocessing
canvas_display = canvas; % Copy the canvas so we can make changes to the display
% without modifying the underlying data
if useLogScale
canvas_display = log10(canvas_display);
end
%canvas_display = canvas_display > 0; % Show binary rather than graded data
%colormap gray % Use grayscale so background is black
%canvas_display = imcomplement(canvas_display); % Display a negative image

image_handle = imagesc([axisLimits(1) axisLimits(2)],[axisLimits(3) axisLimits(4)],canvas_display'); % Show the image
axis('xy') % Reverse the y-axis
set(g,'units','normalized' ); % Enable resizing

if nargout > 0
h = image_handle;
end

25 Apr 2011 BLu

works fast and well

08 Dec 2010 Daniel Armyr

@Bryna: This code could be trivially expanded to 3 dimensions. However, what you would se would be a multicolored fog. Without the ability (on most computers) to show stereo images, I believe that the resulting data would just be a mess and not actually reaveal much information about the dataset.

@Eric: Will take a look at it and do an update. Thanks for contributing.

08 Dec 2010 Bryna Flaim

Does anyone know of something similar to this but for 3 dimensions? (x,y, and z)? cheers.

29 Oct 2010 Eric

This is a wonderful piece of code. There have been times when I wanted to overlay other plots on the cloudPlot; however, since the axis are scale not according to their labels I modified the code to output scaling values for the overlay plots. I added several output arguments: an xscale and yscale which can be applied to the overlay plots x and y axis to have it correctly sit on the cloudPlot. See examples in comments. I've only tested it on a couple of things. Let me know what you think. Thanks for suppling a sweet piece of code.

function [ varargout ] = cloudPlot( X, Y, axisLimits, useLogScale )

%CLOUDPLOT Does a cloud plot of the data in X and Y.

% CLOUDPLOT(X,Y) draws a cloudplot of Y versus X. A cloudplot is in essence

% a 2 dimensional histogram showing the density distribution of the data

% described by X and Y. As the plot displays density information, the

% dimensionality of X and Y are ignored. The only requirement is that X and

% Y have the same number of elements. Cloudplot is written to visualize

% large quantities of data and is most appropriate for datasets of 10000

% elements or more.

%

% CLOUDPLOT is fully compatible with SUBPLOT, COLORBAR and COLORMAP, but

% does not handle zoom or rescaling of the axis in any way. Hold states are

% also ignored.

%

% CLOUDPLOT(X,Y,axisLimits) plots the values of X and Y that are within the

% limits specified by axisLimits. axisLimits is a 4-element vector with the

% same meaning as "axis( axisLimits )" would have on a regular plot.

% axisLimits can be [] in which case it will be ignored.

%

% CLOUDPLOT(X,Y,axisLimits,useLogScale) If useLogScale == true, plots the

% base10 logarithm of the density at each point. Useful for distributions

% where the density varies greatly.

%
%
% OUTPUT:
%

% [h, xscale, yscale] = CLOUDPLOT(...)

% or
% [xscale, yscale] = CLOUDPLOT(...)
% or
% [] = CLOUDPLOT(...)
%
% Where h is the handle to the cloud plot, xscale is the scaling value of the
% to x-axis be applied to any overlaying graphics, and yscale is the scaling
% factor for y-axis
%
%
%

% Example:

% cloudPlot( randn(10000,1), randn(10000,1) );

% Will plot a Gaussian scattered distribution.

%

% cloudPlot( randn(1000,1), randn(1000,1), [0 3 -1 4]);

% Will plot a Gaussian scattered distribution of X and Y values within

% the limits.
%
% [xscale, yscale] = cloudPlot( randn(1000,1), randn(1000,1), [0 3 -1 4]);
% set(gca,'nextPlot','add');
% x = 0:0.1:3;
% plot(x*xscale,(x.^2)*yscale,'k')
% Will plot a Gaussian scattered distribution of X and Y values within

% the limits and an overlaying plot of x^2 on same axis
%

assert ( numel(X) == numel(Y), ...

'The number of elements in X and Y must be the same.' );

if ( nargin >= 3 && ~isempty(axisLimits) )

pointSelect = X<=axisLimits(2) & X>=axisLimits(1) & ...

Y<=axisLimits(4) & Y>=axisLimits(3);

X = X(pointSelect);

Y = Y(pointSelect);

end

if ( nargin < 4 )

useLogScale = false;

end

%Remove any nans or Infs in the data as they have no meaning in this

%context.

pointSelect = isinf(X) | isnan(X) | isinf(Y) | isnan(Y);

X = X(~pointSelect);

Y = Y(~pointSelect);

% Plot to get appropriate limits

figure('Color','W')

if ( nargin >= 3 && ~isempty(axisLimits) )

h = plot ( [axisLimits(1) axisLimits(2)], [axisLimits(3) axisLimits(4)] );

else

h = plot ( [min(X(:)) max(X(:))], [min(Y(:)) max(Y(:))] );

end

g = get( h, 'Parent' );

xLim = get(g, 'Xlim' );

yLim = get(g, 'Ylim' );

xTick = get(g, 'XTick' );

xTickLabel = get(g, 'XTickLabel' );

yTick = get(g, 'YTick' );

yTickLabel = get(g, 'YTickLabel' );

%Get the bin size.

unitType = get(g,'Unit');

set(g,'Unit','Pixels')

axesPos = get(g,'Position');

nHorizontalBins = axesPos(3);

nVerticalBins = axesPos(4);

set(g,'Unit', unitType );

% Allocate an area to draw on

bins = ceil([nHorizontalBins nVerticalBins ]);

binSize(2) = diff(yLim)./(bins(2));

binSize(1) = diff(xLim)./(bins(1));

canvas = zeros(bins);

% Draw in the canvas

xBinIndex = floor((X - xLim(1))/binSize(1))+1;

yBinIndex = floor((Y - yLim(1))/binSize(2))+1;

try

for i = 1:numel(xBinIndex);

canvas(xBinIndex(i),yBinIndex(i)) = ...

canvas(xBinIndex(i),yBinIndex(i)) + 1;

end

catch ME %#ok

disp ( [xBinIndex(i) yBinIndex(i)] )

disp ( bins );

disp ( '--' );

end

% Show the canvas and adjust the grids.

if ( useLogScale )

h = imagesc ( log10(canvas') );

else

h = imagesc ( canvas' );

end

axis ( 'xy' );

g = get( h, 'Parent' );

% Create scaling factors for overlay plots
xLim_sc = get(g,'xlim');
yLim_sc = get(g,'ylim');
xscale = (xLim_sc(2)-xLim_sc(1))/(xLim(2)-xLim(1));
yscale = (yLim_sc(2)-yLim_sc(1))/(yLim(2)-yLim(1));

% Adjust tickmarks on cloudplot

xTickAdjust = (xTick - min(xTick))*bins(1)/(max(xTick)-min(xTick))+0.5;

yTickAdjust = (yTick - min(yTick))*bins(2)/(max(yTick)-min(yTick))+0.5;

set ( g, 'XTick', xTickAdjust );

set ( g, 'XTickLabel', xTickLabel );

set ( g, 'YTick', yTickAdjust );

set ( g, 'YTickLabel', yTickLabel );

% Optionally return a handle and or scale factors

if ( nargout == 3 )

varargout{1} = h;
varargout{2} = xscale;
varargout{3} = yscale;
elseif ( nargout == 2)
varargout{1} = xscale;
varargout{2} = yscale;
elseif ( nargout == 1)
varargout{1} = h;

end

04 Mar 2010 K S  
10 Mar 2009 John D'Errico

Nice. This could be useful for someone who just wishes to visualize how a big set of data is distributed.

Good help. Error checks. I like that the author even checks for the number of output arguments to decide if the handle is returned. All well done.

10 Mar 2009 Daniel Armyr

I found a first bug in the script. The function does not handle NaNs or Infs in a nice way. For now, as a work-around, make sure you remova any NaNs or Infs before passing data to the function.

Updates
10 Mar 2009

Now NaNs and Infs are handled gracefully.

17 Apr 2009

Improved the bin-size selection to make manual selection redundant.
Added proper handling of NANs and INFs
Added option to select axis limits and logarithmic colormap.

07 Aug 2009

Did a one-line fix that in certain cases speeds up the function by a factor of 5.

06 May 2011

Thanks to Eric for the addition of the xscale/yscale output option.

Thanks to Romesh who provided a bugfix, the bins input option, and the ability to resize the plot.

13 May 2011

Updated with more suggestions from Romesh to remove the need to return scaling factor. Made compatible with hold states and incldued more demos.

06 Sep 2013

Added more flexible arguments and more error handling after suggestions by Eric Sampson

Contact us