Editor's Note: This file was selected as MATLAB Central Pick of the Week
cloudPlot will help visualize the distribution of a 2dimensional dataset. It is especially helpful when looking at extremely large datasets where a redular plot(x,y,'.') will just fill the plot with a solid color because the measurement points overlap each other.
cloudPlot uses the builtin matlab routines to set the axis limits and grid points appropriate to the data.
Daniel Armyr (2020). cloudPlot (https://www.mathworks.com/matlabcentral/fileexchange/23238cloudplot), MATLAB Central File Exchange. Retrieved .
1.10.0.0  Added more flexible arguments and more error handling after suggestions by Eric Sampson 

1.8.0.0  Updated with more suggestions from Romesh to remove the need to return scaling factor. Made compatible with hold states and incldued more demos. 

1.7.0.0  Thanks to Eric for the addition of the xscale/yscale output option. Thanks to Romesh who provided a bugfix, the bins input option, and the ability to resize the plot. 

1.5.0.0  Did a oneline fix that in certain cases speeds up the function by a factor of 5. 

1.2.0.0  Improved the binsize selection to make manual selection redundant.


1.1.0.0  Now NaNs and Infs are handled gracefully. 
Create scripts with code, output, and formatted text in a single executable document.
Pearl (view profile)
Anton Daneyko (view profile)
The speed up pointed out by Philip is a must! For me it made it almost twice faster. Use
canvas = accumarray( [xBinIndex, yBinIndex], 1, bins);
instead of what is now lines 152155
Philip (view profile)
Very nice, the automatic binning to the pixels is very good, and all the argument parsing stuff makes it flexible. You could replace lines 139 and 152155 with:
canvas = accumarray( [xBinIndex, yBinIndex], 1, bins);
for a bit of speed up (and it isn't reliant on acceleration).
Robert (view profile)
Wow, that surprises me. In one case I ran, the profiler reported the time spent populating the canvas dropped from 80 seconds to 10 seconds. That was using MATLAB r2012b 64 bit on Windows 7 for creating a cloudPlot from around 10M data points. If I can figure out what the difference was I will let you know.
Daniel Armyr (view profile)
Robert: Very nice idea. Unfortunately, I do not see that it gives a speedup in any of the cases I have tried. I will therefore say that this is not an improvement and will not update. However, if you can provide me an example where you see significant speedup using the full/sparse trick, then I would be happy to look at it and see how I can get the best performance for all cases.
Robert (view profile)
Love this function. I haven't worked with sparse matricies much so I can't speak to the robustness of this, but as food for thought:
The counting on lines 152  155 can be done more quickly using
canvas = full(sparse(xBinIndex,yBinIndex,1,bins(1),bins(2)));
Anyway, thanks for this submission and for maintaining and updating it.
Felipe G. Nievinski (view profile)
See also file ID # 29709, 18386, 31889, 29641, 6037, 29435, 25914, 25325, 14205, 13352, 43672, ...
Daniel Armyr (view profile)
I disagree, but I have given you my contact information so we can take a closer look at it.
Eric Sampson (view profile)
Handy function Daniel! However, unless I'm missing something I think you need to do a flipud(canvas) before passing it to imagesc?
Jimmy (view profile)
Yakir (view profile)
fast
William Noderer (view profile)
Daniel Armyr (view profile)
Hi.
That is a pretty good idea. I will take it into consideration, but for now, here is a quick hack that will give you what you are looking for.
Use it the same way that you use the original function, only add a final argument (scaleFactor) that tells the function what scale factor to apply. If you want a density function, you set the last argument to 1/numel(X). In order to avoid extensive recoding, you will have to provide all the other arguments in order to use a scaling factor.
Example of density plot of random data
nPoints = 1e5;
clpoudPlot ( randn(nPoints,1), randn(nPoints,1), [], false, [], 1/nPoints );

function [ varargout ] = cloudPlot( X, Y, axisLimits, useLogScale, bins, scaleFactor )
% Check the data size
assert ( numel(X) == numel(Y), ...
'The number of elements in X and Y must be the same.' );
if ( nargin >= 3 && ~isempty(axisLimits) )
pointSelect = X<=axisLimits(2) & X>=axisLimits(1) & ...
Y<=axisLimits(4) & Y>=axisLimits(3);
X = X(pointSelect);
Y = Y(pointSelect);
axisLimitsSet = true;
else
axisLimitsSet = false;
end
if ( nargin < 4  isempty(useLogScale) )
useLogScale = false;
end
if ( nargin < 5 )
bins = [];
end
%Remove any nans or Infs in the data as they have no meaning in this
%context.
pointSelect = ~(isinf(X)  isnan(X)  isinf(Y)  isnan(Y));
X = X(pointSelect);
Y = Y(pointSelect);
% Plot to get appropriate limits
h = [];
if ( axisLimitsSet )
g = gca;
set ( g, 'Xlim', [axisLimits(1) axisLimits(2)] );
set ( g, 'Ylim', [axisLimits(3) axisLimits(4)] );
set ( g, 'units', 'pixels' );
else
h = plot ( X(:), Y(:), '.' );
g = get( h, 'Parent' );
end
xLim = get(g, 'Xlim' );
yLim = get(g, 'Ylim' );
%Get the bin size.
unitType = get(g,'Unit');
set(g,'Unit','Pixels')
axesPos = get(g,'Position');
nHorizontalBins = axesPos(3);
nVerticalBins = axesPos(4);
set(g,'Unit', unitType );
% Clear the data, as we actually don't want to see it.
if ( ~isempty(h) )
set ( h, 'XData', [] );
set ( h, 'YData', [] );
end
% Allocate an area to draw on
if ( isempty(bins) )
bins = ceil([nHorizontalBins nVerticalBins ]);
else
assert ( isnumeric(bins) && isreal(bins) );
assert ( numel(bins) == 2 );
end
binSize(2) = diff(yLim)./(bins(2));
binSize(1) = diff(xLim)./(bins(1));
canvas = zeros(bins);
% Draw in the canvas
xBinIndex = floor((X  xLim(1))/binSize(1))+1;
yBinIndex = floor((Y  yLim(1))/binSize(2))+1;
% Added security: Make sure indexes are never outside canvas. May not be
% possible.
pointsSelect = xBinIndex > 0 & xBinIndex <= bins(1) & ...
yBinIndex > 0 & yBinIndex <= bins(2);
xBinIndex = xBinIndex(pointsSelect);
yBinIndex = yBinIndex(pointsSelect);
for i = 1:numel(xBinIndex);
canvas(xBinIndex(i),yBinIndex(i)) = ...
canvas(xBinIndex(i),yBinIndex(i)) + 1;
end
% Show the canvas and adjust the grids.
if ( useLogScale )
h = imagesc(xLim, yLim,log10(canvas*scaleFactor)');
else
h = imagesc(xLim, yLim, canvas'*scaleFactor);
end
axis ( 'xy' );
axis ( 'tight' );
set ( g, 'units', 'normalized' ); % Enable resizing
% Optionally return a handle.
if ( nargout == 1)
varargout{1} = h;
end
texas A&M universdity (view profile)
Great Code Daniel.
Can you please suggest how I can obtain probability density instead of histogram in your Code or How I can divide the frequency in all the bins with a constant number.
Thank you
Sireesh Dadi
Daniel Armyr (view profile)
JG: I your data is 1D, then this function is not for you. Read up on the details of hist() in the documentation and you will find that Hist can do what you are looking for.
J G (view profile)
This is a very cool function but I need some help... I only have a vector Y with about 1000 elements(values >0 and <1) and I have shown the frequency at which they occur by hist(Y). However, I wondered if there was a way of using this function, showing which regions of the space between 0 and 1 most frequently occur in Y? Thanks
Daniel (view profile)
Daniel Armyr (view profile)
Of course you are right. New version with better scaling, support for hold states, and support for manual setting of transparency. See the published demo file for examples.
Romesh (view profile)
Thanks for the update Daniel! However, the key change I made was in calling imagesc:
imagesc([axisLimits(1) axisLimits(2)],[axisLimits(3) axisLimits(4)],canvas_display');
If you provide imagesc with the axis limits, then you are able to completely do away with rescaling. Thus you don't need scale factors and you don't need to plot and record the original axes much simpler!
Daniel Armyr (view profile)
OK. Updated with the suggestions above and done some elementary testing so hopefully there are no major bugs.
Romesh: I did a variant on your idea. If axis limits are provided, these are used exactly. If none are provided, the function finds axis that will look pretty, the same way that the plot command does.
Romesh (view profile)
Awesome code, I use it all the time and it is fantastic. However, there were a couple of things that are rough around the edges
 Different scaling means the resulting plot has different axis limits so it is hard to overlay data
 Cannot move the plot around using the drag tool
 Sometimes the limits are not what I expect (e.g. there are empty spaces on the plot because the plot is larger than the canvas I get that this shouldn't happen in theory!)
 Cannot readily modify the number of bins to give a different plot resolution
I've made some changes that I think fix all of these problems (definitely NOT certified bug free though!), and I've also managed to remove the trycatch block because I think I've fixed the error condition that it was catching.
This code is a dropin replacement for cloudPlot:
function [h,canvas] = cloudPlot(X,Y,axisLimits,useLogScale,bins)
% Display a cloud plot from X and Y data
%
% Only X and Y are mandatory arguments
% useLogScale = 1 to take the log of the data (to visalise data where
% there are large variations in density)
% axisLimits = [x_min, x_max, y_min, y_max], [] to omit
% bins = [number_x_bins, number_y_bins] (defines resolution of output)
%
% e.g. cloudPlot(x,y,[],1,[500 500]) to produce a cloudPlot with
% 500x500 bins, displaying the log of the density with automatic
% axis limits
%
% By Romesh Abeysuriya, based heavily on cloudPlot.m by Daniel Armyr
%
% (from Daniel Armyr's cloudPlot.m):
% A cloudplot is in essence a 2 dimensional histogram showing the
% density distribution of the data described by X and Y.
% As the plot displays density information, the
% dimensionality of X and Y are ignored. The only requirement is that X and
% Y have the same number of elements. Cloudplot is written to visualize
% large quantities of data and is most appropriate for datasets of 10000
% elements or more.
% Check the data size
assert (numel(X) == numel(Y),'The number of elements in X and Y must be the same');
if nargin < 4; useLogScale = false; end
if nargin < 3  isempty(axisLimits) % Calculate default axis limits
axisLimits = [min(X(:)) max(X(:)), min(Y(:)) max(Y(:))] ;
end
%Remove any nans or Infs in the data
pointSelect = isfinite(X) & isfinite(Y) & X >= axisLimits(1) & X <= axisLimits(2) & Y >= axisLimits(3) & Y <= axisLimits(4);
X = X(pointSelect);
Y = Y(pointSelect);
g = axes('Xlim',[axisLimits(1) axisLimits(2)],'Ylim',[axisLimits(3) axisLimits(4)],'units','pixels');
if nargin < 5 % Calculate the default number of pixels
plotsize = get(g,'Position');
bins = ceil(plotsize(3:4));
end
binSize(2) = diff(axisLimits(3:4))./(bins(2)1);
binSize(1) = diff(axisLimits(1:2))./(bins(1)1);
canvas = zeros(bins);
% Calculate bin indices for each of the points in the data arrays
xBinIndex = floor((X  axisLimits(1))/binSize(1))+1;
yBinIndex = floor((Y  axisLimits(3))/binSize(2))+1;
% Collect the indices
for i = 1:numel(xBinIndex);
canvas(xBinIndex(i),yBinIndex(i)) = canvas(xBinIndex(i),yBinIndex(i)) + 1;
end
% Postprocessing
canvas_display = canvas; % Copy the canvas so we can make changes to the display
% without modifying the underlying data
if useLogScale
canvas_display = log10(canvas_display);
end
%canvas_display = canvas_display > 0; % Show binary rather than graded data
%colormap gray % Use grayscale so background is black
%canvas_display = imcomplement(canvas_display); % Display a negative image
image_handle = imagesc([axisLimits(1) axisLimits(2)],[axisLimits(3) axisLimits(4)],canvas_display'); % Show the image
axis('xy') % Reverse the yaxis
set(g,'units','normalized' ); % Enable resizing
if nargout > 0
h = image_handle;
end
BLu (view profile)
works fast and well
Daniel Armyr (view profile)
@Bryna: This code could be trivially expanded to 3 dimensions. However, what you would se would be a multicolored fog. Without the ability (on most computers) to show stereo images, I believe that the resulting data would just be a mess and not actually reaveal much information about the dataset.
@Eric: Will take a look at it and do an update. Thanks for contributing.
Bryna Flaim (view profile)
Does anyone know of something similar to this but for 3 dimensions? (x,y, and z)? cheers.
Eric (view profile)
This is a wonderful piece of code. There have been times when I wanted to overlay other plots on the cloudPlot; however, since the axis are scale not according to their labels I modified the code to output scaling values for the overlay plots. I added several output arguments: an xscale and yscale which can be applied to the overlay plots x and y axis to have it correctly sit on the cloudPlot. See examples in comments. I've only tested it on a couple of things. Let me know what you think. Thanks for suppling a sweet piece of code.
function [ varargout ] = cloudPlot( X, Y, axisLimits, useLogScale )
%CLOUDPLOT Does a cloud plot of the data in X and Y.
% CLOUDPLOT(X,Y) draws a cloudplot of Y versus X. A cloudplot is in essence
% a 2 dimensional histogram showing the density distribution of the data
% described by X and Y. As the plot displays density information, the
% dimensionality of X and Y are ignored. The only requirement is that X and
% Y have the same number of elements. Cloudplot is written to visualize
% large quantities of data and is most appropriate for datasets of 10000
% elements or more.
%
% CLOUDPLOT is fully compatible with SUBPLOT, COLORBAR and COLORMAP, but
% does not handle zoom or rescaling of the axis in any way. Hold states are
% also ignored.
%
% CLOUDPLOT(X,Y,axisLimits) plots the values of X and Y that are within the
% limits specified by axisLimits. axisLimits is a 4element vector with the
% same meaning as "axis( axisLimits )" would have on a regular plot.
% axisLimits can be [] in which case it will be ignored.
%
% CLOUDPLOT(X,Y,axisLimits,useLogScale) If useLogScale == true, plots the
% base10 logarithm of the density at each point. Useful for distributions
% where the density varies greatly.
%
%
% OUTPUT:
%
% [h, xscale, yscale] = CLOUDPLOT(...)
% or
% [xscale, yscale] = CLOUDPLOT(...)
% or
% [] = CLOUDPLOT(...)
%
% Where h is the handle to the cloud plot, xscale is the scaling value of the
% to xaxis be applied to any overlaying graphics, and yscale is the scaling
% factor for yaxis
%
%
%
% Example:
% cloudPlot( randn(10000,1), randn(10000,1) );
% Will plot a Gaussian scattered distribution.
%
% cloudPlot( randn(1000,1), randn(1000,1), [0 3 1 4]);
% Will plot a Gaussian scattered distribution of X and Y values within
% the limits.
%
% [xscale, yscale] = cloudPlot( randn(1000,1), randn(1000,1), [0 3 1 4]);
% set(gca,'nextPlot','add');
% x = 0:0.1:3;
% plot(x*xscale,(x.^2)*yscale,'k')
% Will plot a Gaussian scattered distribution of X and Y values within
% the limits and an overlaying plot of x^2 on same axis
%
assert ( numel(X) == numel(Y), ...
'The number of elements in X and Y must be the same.' );
if ( nargin >= 3 && ~isempty(axisLimits) )
pointSelect = X<=axisLimits(2) & X>=axisLimits(1) & ...
Y<=axisLimits(4) & Y>=axisLimits(3);
X = X(pointSelect);
Y = Y(pointSelect);
end
if ( nargin < 4 )
useLogScale = false;
end
%Remove any nans or Infs in the data as they have no meaning in this
%context.
pointSelect = isinf(X)  isnan(X)  isinf(Y)  isnan(Y);
X = X(~pointSelect);
Y = Y(~pointSelect);
% Plot to get appropriate limits
figure('Color','W')
if ( nargin >= 3 && ~isempty(axisLimits) )
h = plot ( [axisLimits(1) axisLimits(2)], [axisLimits(3) axisLimits(4)] );
else
h = plot ( [min(X(:)) max(X(:))], [min(Y(:)) max(Y(:))] );
end
g = get( h, 'Parent' );
xLim = get(g, 'Xlim' );
yLim = get(g, 'Ylim' );
xTick = get(g, 'XTick' );
xTickLabel = get(g, 'XTickLabel' );
yTick = get(g, 'YTick' );
yTickLabel = get(g, 'YTickLabel' );
%Get the bin size.
unitType = get(g,'Unit');
set(g,'Unit','Pixels')
axesPos = get(g,'Position');
nHorizontalBins = axesPos(3);
nVerticalBins = axesPos(4);
set(g,'Unit', unitType );
% Allocate an area to draw on
bins = ceil([nHorizontalBins nVerticalBins ]);
binSize(2) = diff(yLim)./(bins(2));
binSize(1) = diff(xLim)./(bins(1));
canvas = zeros(bins);
% Draw in the canvas
xBinIndex = floor((X  xLim(1))/binSize(1))+1;
yBinIndex = floor((Y  yLim(1))/binSize(2))+1;
try
for i = 1:numel(xBinIndex);
canvas(xBinIndex(i),yBinIndex(i)) = ...
canvas(xBinIndex(i),yBinIndex(i)) + 1;
end
catch ME %#ok
disp ( [xBinIndex(i) yBinIndex(i)] )
disp ( bins );
disp ( '' );
end
% Show the canvas and adjust the grids.
if ( useLogScale )
h = imagesc ( log10(canvas') );
else
h = imagesc ( canvas' );
end
axis ( 'xy' );
g = get( h, 'Parent' );
% Create scaling factors for overlay plots
xLim_sc = get(g,'xlim');
yLim_sc = get(g,'ylim');
xscale = (xLim_sc(2)xLim_sc(1))/(xLim(2)xLim(1));
yscale = (yLim_sc(2)yLim_sc(1))/(yLim(2)yLim(1));
% Adjust tickmarks on cloudplot
xTickAdjust = (xTick  min(xTick))*bins(1)/(max(xTick)min(xTick))+0.5;
yTickAdjust = (yTick  min(yTick))*bins(2)/(max(yTick)min(yTick))+0.5;
set ( g, 'XTick', xTickAdjust );
set ( g, 'XTickLabel', xTickLabel );
set ( g, 'YTick', yTickAdjust );
set ( g, 'YTickLabel', yTickLabel );
% Optionally return a handle and or scale factors
if ( nargout == 3 )
varargout{1} = h;
varargout{2} = xscale;
varargout{3} = yscale;
elseif ( nargout == 2)
varargout{1} = xscale;
varargout{2} = yscale;
elseif ( nargout == 1)
varargout{1} = h;
end
K S (view profile)
John D'Errico (view profile)
Nice. This could be useful for someone who just wishes to visualize how a big set of data is distributed.
Good help. Error checks. I like that the author even checks for the number of output arguments to decide if the handle is returned. All well done.
Daniel Armyr (view profile)
I found a first bug in the script. The function does not handle NaNs or Infs in a nice way. For now, as a workaround, make sure you remova any NaNs or Infs before passing data to the function.