version 1.10.0.0 (110 KB) by
Daniel Armyr

A function to plot the distribution of 2-dimensional data.

**Editor's Note:** This file was selected as MATLAB Central Pick of the Week

cloudPlot will help visualize the distribution of a 2-dimensional dataset. It is especially helpful when looking at extremely large datasets where a redular plot(x,y,'.') will just fill the plot with a solid color because the measurement points overlap each other.

cloudPlot uses the built-in matlab routines to set the axis limits and grid points appropriate to the data.

Daniel Armyr (2021). cloudPlot (https://www.mathworks.com/matlabcentral/fileexchange/23238-cloudplot), MATLAB Central File Exchange. Retrieved .

Created with
R2013a

Compatible with any release

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

PearlAnton DaneykoThe speed up pointed out by Philip is a must! For me it made it almost twice faster. Use

canvas = accumarray( [xBinIndex, yBinIndex], 1, bins);

instead of what is now lines 152-155

PhilipVery nice, the automatic binning to the pixels is very good, and all the argument parsing stuff makes it flexible. You could replace lines 139 and 152-155 with:

canvas = accumarray( [xBinIndex, yBinIndex], 1, bins);

for a bit of speed up (and it isn't reliant on acceleration).

RobertWow, that surprises me. In one case I ran, the profiler reported the time spent populating the canvas dropped from 80 seconds to 10 seconds. That was using MATLAB r2012b 64 bit on Windows 7 for creating a cloudPlot from around 10M data points. If I can figure out what the difference was I will let you know.

Daniel ArmyrRobert: Very nice idea. Unfortunately, I do not see that it gives a speedup in any of the cases I have tried. I will therefore say that this is not an improvement and will not update. However, if you can provide me an example where you see significant speedup using the full/sparse trick, then I would be happy to look at it and see how I can get the best performance for all cases.

RobertLove this function. I haven't worked with sparse matricies much so I can't speak to the robustness of this, but as food for thought:

The counting on lines 152 - 155 can be done more quickly using

canvas = full(sparse(xBinIndex,yBinIndex,1,bins(1),bins(2)));

Anyway, thanks for this submission and for maintaining and updating it.

Felipe G. NievinskiSee also file ID # 29709, 18386, 31889, 29641, 6037, 29435, 25914, 25325, 14205, 13352, 43672, ...

Daniel ArmyrI disagree, but I have given you my contact information so we can take a closer look at it.

Eric SampsonHandy function Daniel! However, unless I'm missing something I think you need to do a flipud(canvas) before passing it to imagesc?

JimmyYakirfast

William NodererDaniel ArmyrHi.

That is a pretty good idea. I will take it into consideration, but for now, here is a quick hack that will give you what you are looking for.

Use it the same way that you use the original function, only add a final argument (scaleFactor) that tells the function what scale factor to apply. If you want a density function, you set the last argument to 1/numel(X). In order to avoid extensive recoding, you will have to provide all the other arguments in order to use a scaling factor.

Example of density plot of random data

nPoints = 1e5;

clpoudPlot ( randn(nPoints,1), randn(nPoints,1), [], false, [], 1/nPoints );

--

function [ varargout ] = cloudPlot( X, Y, axisLimits, useLogScale, bins, scaleFactor )

% Check the data size

assert ( numel(X) == numel(Y), ...

'The number of elements in X and Y must be the same.' );

if ( nargin >= 3 && ~isempty(axisLimits) )

pointSelect = X<=axisLimits(2) & X>=axisLimits(1) & ...

Y<=axisLimits(4) & Y>=axisLimits(3);

X = X(pointSelect);

Y = Y(pointSelect);

axisLimitsSet = true;

else

axisLimitsSet = false;

end

if ( nargin < 4 || isempty(useLogScale) )

useLogScale = false;

end

if ( nargin < 5 )

bins = [];

end

%Remove any nans or Infs in the data as they have no meaning in this

%context.

pointSelect = ~(isinf(X) | isnan(X) | isinf(Y) | isnan(Y));

X = X(pointSelect);

Y = Y(pointSelect);

% Plot to get appropriate limits

h = [];

if ( axisLimitsSet )

g = gca;

set ( g, 'Xlim', [axisLimits(1) axisLimits(2)] );

set ( g, 'Ylim', [axisLimits(3) axisLimits(4)] );

set ( g, 'units', 'pixels' );

else

h = plot ( X(:), Y(:), '.' );

g = get( h, 'Parent' );

end

xLim = get(g, 'Xlim' );

yLim = get(g, 'Ylim' );

%Get the bin size.

unitType = get(g,'Unit');

set(g,'Unit','Pixels')

axesPos = get(g,'Position');

nHorizontalBins = axesPos(3);

nVerticalBins = axesPos(4);

set(g,'Unit', unitType );

% Clear the data, as we actually don't want to see it.

if ( ~isempty(h) )

set ( h, 'XData', [] );

set ( h, 'YData', [] );

end

% Allocate an area to draw on

if ( isempty(bins) )

bins = ceil([nHorizontalBins nVerticalBins ]);

else

assert ( isnumeric(bins) && isreal(bins) );

assert ( numel(bins) == 2 );

end

binSize(2) = diff(yLim)./(bins(2));

binSize(1) = diff(xLim)./(bins(1));

canvas = zeros(bins);

% Draw in the canvas

xBinIndex = floor((X - xLim(1))/binSize(1))+1;

yBinIndex = floor((Y - yLim(1))/binSize(2))+1;

% Added security: Make sure indexes are never outside canvas. May not be

% possible.

pointsSelect = xBinIndex > 0 & xBinIndex <= bins(1) & ...

yBinIndex > 0 & yBinIndex <= bins(2);

xBinIndex = xBinIndex(pointsSelect);

yBinIndex = yBinIndex(pointsSelect);

for i = 1:numel(xBinIndex);

canvas(xBinIndex(i),yBinIndex(i)) = ...

canvas(xBinIndex(i),yBinIndex(i)) + 1;

end

% Show the canvas and adjust the grids.

if ( useLogScale )

h = imagesc(xLim, yLim,log10(canvas*scaleFactor)');

else

h = imagesc(xLim, yLim, canvas'*scaleFactor);

end

axis ( 'xy' );

axis ( 'tight' );

set ( g, 'units', 'normalized' ); % Enable resizing

% Optionally return a handle.

if ( nargout == 1)

varargout{1} = h;

end

texas A&M universdityGreat Code Daniel.

Can you please suggest how I can obtain probability density instead of histogram in your Code or How I can divide the frequency in all the bins with a constant number.

Thank you

Sireesh Dadi

Daniel ArmyrJG: I your data is 1D, then this function is not for you. Read up on the details of hist() in the documentation and you will find that Hist can do what you are looking for.

J GThis is a very cool function but I need some help... I only have a vector Y with about 1000 elements(values >0 and <1) and I have shown the frequency at which they occur by hist(Y). However, I wondered if there was a way of using this function, showing which regions of the space between 0 and 1 most frequently occur in Y? Thanks

DanielDaniel ArmyrOf course you are right. New version with better scaling, support for hold states, and support for manual setting of transparency. See the published demo file for examples.

RomeshThanks for the update Daniel! However, the key change I made was in calling imagesc:

imagesc([axisLimits(1) axisLimits(2)],[axisLimits(3) axisLimits(4)],canvas_display');

If you provide imagesc with the axis limits, then you are able to completely do away with rescaling. Thus you don't need scale factors and you don't need to plot and record the original axes- much simpler!

Daniel ArmyrOK. Updated with the suggestions above and done some elementary testing so hopefully there are no major bugs.

Romesh: I did a variant on your idea. If axis limits are provided, these are used exactly. If none are provided, the function finds axis that will look pretty, the same way that the plot command does.

RomeshAwesome code, I use it all the time and it is fantastic. However, there were a couple of things that are rough around the edges

- Different scaling means the resulting plot has different axis limits so it is hard to overlay data

- Cannot move the plot around using the drag tool

- Sometimes the limits are not what I expect (e.g. there are empty spaces on the plot because the plot is larger than the canvas- I get that this shouldn't happen in theory!)

- Cannot readily modify the number of bins to give a different plot resolution

I've made some changes that I think fix all of these problems (definitely NOT certified bug free though!), and I've also managed to remove the try-catch block because I think I've fixed the error condition that it was catching.

This code is a drop-in replacement for cloudPlot:

function [h,canvas] = cloudPlot(X,Y,axisLimits,useLogScale,bins)

% Display a cloud plot from X and Y data

%

% Only X and Y are mandatory arguments

% useLogScale = 1 to take the log of the data (to visalise data where

% there are large variations in density)

% axisLimits = [x_min, x_max, y_min, y_max], [] to omit

% bins = [number_x_bins, number_y_bins] (defines resolution of output)

%

% e.g. cloudPlot(x,y,[],1,[500 500]) to produce a cloudPlot with

% 500x500 bins, displaying the log of the density with automatic

% axis limits

%

% By Romesh Abeysuriya, based heavily on cloudPlot.m by Daniel Armyr

%

% (from Daniel Armyr's cloudPlot.m):

% A cloudplot is in essence a 2 dimensional histogram showing the

% density distribution of the data described by X and Y.

% As the plot displays density information, the

% dimensionality of X and Y are ignored. The only requirement is that X and

% Y have the same number of elements. Cloudplot is written to visualize

% large quantities of data and is most appropriate for datasets of 10000

% elements or more.

% Check the data size

assert (numel(X) == numel(Y),'The number of elements in X and Y must be the same');

if nargin < 4; useLogScale = false; end

if nargin < 3 | isempty(axisLimits) % Calculate default axis limits

axisLimits = [min(X(:)) max(X(:)), min(Y(:)) max(Y(:))] ;

end

%Remove any nans or Infs in the data

pointSelect = isfinite(X) & isfinite(Y) & X >= axisLimits(1) & X <= axisLimits(2) & Y >= axisLimits(3) & Y <= axisLimits(4);

X = X(pointSelect);

Y = Y(pointSelect);

g = axes('Xlim',[axisLimits(1) axisLimits(2)],'Ylim',[axisLimits(3) axisLimits(4)],'units','pixels');

if nargin < 5 % Calculate the default number of pixels

plotsize = get(g,'Position');

bins = ceil(plotsize(3:4));

end

binSize(2) = diff(axisLimits(3:4))./(bins(2)-1);

binSize(1) = diff(axisLimits(1:2))./(bins(1)-1);

canvas = zeros(bins);

% Calculate bin indices for each of the points in the data arrays

xBinIndex = floor((X - axisLimits(1))/binSize(1))+1;

yBinIndex = floor((Y - axisLimits(3))/binSize(2))+1;

% Collect the indices

for i = 1:numel(xBinIndex);

canvas(xBinIndex(i),yBinIndex(i)) = canvas(xBinIndex(i),yBinIndex(i)) + 1;

end

% Postprocessing

canvas_display = canvas; % Copy the canvas so we can make changes to the display

% without modifying the underlying data

if useLogScale

canvas_display = log10(canvas_display);

end

%canvas_display = canvas_display > 0; % Show binary rather than graded data

%colormap gray % Use grayscale so background is black

%canvas_display = imcomplement(canvas_display); % Display a negative image

image_handle = imagesc([axisLimits(1) axisLimits(2)],[axisLimits(3) axisLimits(4)],canvas_display'); % Show the image

axis('xy') % Reverse the y-axis

set(g,'units','normalized' ); % Enable resizing

if nargout > 0

h = image_handle;

end

BLuworks fast and well

Daniel Armyr@Bryna: This code could be trivially expanded to 3 dimensions. However, what you would se would be a multicolored fog. Without the ability (on most computers) to show stereo images, I believe that the resulting data would just be a mess and not actually reaveal much information about the dataset.

@Eric: Will take a look at it and do an update. Thanks for contributing.

Bryna FlaimDoes anyone know of something similar to this but for 3 dimensions? (x,y, and z)? cheers.

EricThis is a wonderful piece of code. There have been times when I wanted to overlay other plots on the cloudPlot; however, since the axis are scale not according to their labels I modified the code to output scaling values for the overlay plots. I added several output arguments: an xscale and yscale which can be applied to the overlay plots x and y axis to have it correctly sit on the cloudPlot. See examples in comments. I've only tested it on a couple of things. Let me know what you think. Thanks for suppling a sweet piece of code.

function [ varargout ] = cloudPlot( X, Y, axisLimits, useLogScale )

%CLOUDPLOT Does a cloud plot of the data in X and Y.

% CLOUDPLOT(X,Y) draws a cloudplot of Y versus X. A cloudplot is in essence

% a 2 dimensional histogram showing the density distribution of the data

% described by X and Y. As the plot displays density information, the

% dimensionality of X and Y are ignored. The only requirement is that X and

% Y have the same number of elements. Cloudplot is written to visualize

% large quantities of data and is most appropriate for datasets of 10000

% elements or more.

%

% CLOUDPLOT is fully compatible with SUBPLOT, COLORBAR and COLORMAP, but

% does not handle zoom or rescaling of the axis in any way. Hold states are

% also ignored.

%

% CLOUDPLOT(X,Y,axisLimits) plots the values of X and Y that are within the

% limits specified by axisLimits. axisLimits is a 4-element vector with the

% same meaning as "axis( axisLimits )" would have on a regular plot.

% axisLimits can be [] in which case it will be ignored.

%

% CLOUDPLOT(X,Y,axisLimits,useLogScale) If useLogScale == true, plots the

% base10 logarithm of the density at each point. Useful for distributions

% where the density varies greatly.

%

%

% OUTPUT:

%

% [h, xscale, yscale] = CLOUDPLOT(...)

% or

% [xscale, yscale] = CLOUDPLOT(...)

% or

% [] = CLOUDPLOT(...)

%

% Where h is the handle to the cloud plot, xscale is the scaling value of the

% to x-axis be applied to any overlaying graphics, and yscale is the scaling

% factor for y-axis

%

%

%

% Example:

% cloudPlot( randn(10000,1), randn(10000,1) );

% Will plot a Gaussian scattered distribution.

%

% cloudPlot( randn(1000,1), randn(1000,1), [0 3 -1 4]);

% Will plot a Gaussian scattered distribution of X and Y values within

% the limits.

%

% [xscale, yscale] = cloudPlot( randn(1000,1), randn(1000,1), [0 3 -1 4]);

% set(gca,'nextPlot','add');

% x = 0:0.1:3;

% plot(x*xscale,(x.^2)*yscale,'k')

% Will plot a Gaussian scattered distribution of X and Y values within

% the limits and an overlaying plot of x^2 on same axis

%

assert ( numel(X) == numel(Y), ...

'The number of elements in X and Y must be the same.' );

if ( nargin >= 3 && ~isempty(axisLimits) )

pointSelect = X<=axisLimits(2) & X>=axisLimits(1) & ...

Y<=axisLimits(4) & Y>=axisLimits(3);

X = X(pointSelect);

Y = Y(pointSelect);

end

if ( nargin < 4 )

useLogScale = false;

end

%Remove any nans or Infs in the data as they have no meaning in this

%context.

pointSelect = isinf(X) | isnan(X) | isinf(Y) | isnan(Y);

X = X(~pointSelect);

Y = Y(~pointSelect);

% Plot to get appropriate limits

figure('Color','W')

if ( nargin >= 3 && ~isempty(axisLimits) )

h = plot ( [axisLimits(1) axisLimits(2)], [axisLimits(3) axisLimits(4)] );

else

h = plot ( [min(X(:)) max(X(:))], [min(Y(:)) max(Y(:))] );

end

g = get( h, 'Parent' );

xLim = get(g, 'Xlim' );

yLim = get(g, 'Ylim' );

xTick = get(g, 'XTick' );

xTickLabel = get(g, 'XTickLabel' );

yTick = get(g, 'YTick' );

yTickLabel = get(g, 'YTickLabel' );

%Get the bin size.

unitType = get(g,'Unit');

set(g,'Unit','Pixels')

axesPos = get(g,'Position');

nHorizontalBins = axesPos(3);

nVerticalBins = axesPos(4);

set(g,'Unit', unitType );

% Allocate an area to draw on

bins = ceil([nHorizontalBins nVerticalBins ]);

binSize(2) = diff(yLim)./(bins(2));

binSize(1) = diff(xLim)./(bins(1));

canvas = zeros(bins);

% Draw in the canvas

xBinIndex = floor((X - xLim(1))/binSize(1))+1;

yBinIndex = floor((Y - yLim(1))/binSize(2))+1;

try

for i = 1:numel(xBinIndex);

canvas(xBinIndex(i),yBinIndex(i)) = ...

canvas(xBinIndex(i),yBinIndex(i)) + 1;

end

catch ME %#ok

disp ( [xBinIndex(i) yBinIndex(i)] )

disp ( bins );

disp ( '--' );

end

% Show the canvas and adjust the grids.

if ( useLogScale )

h = imagesc ( log10(canvas') );

else

h = imagesc ( canvas' );

end

axis ( 'xy' );

g = get( h, 'Parent' );

% Create scaling factors for overlay plots

xLim_sc = get(g,'xlim');

yLim_sc = get(g,'ylim');

xscale = (xLim_sc(2)-xLim_sc(1))/(xLim(2)-xLim(1));

yscale = (yLim_sc(2)-yLim_sc(1))/(yLim(2)-yLim(1));

% Adjust tickmarks on cloudplot

xTickAdjust = (xTick - min(xTick))*bins(1)/(max(xTick)-min(xTick))+0.5;

yTickAdjust = (yTick - min(yTick))*bins(2)/(max(yTick)-min(yTick))+0.5;

set ( g, 'XTick', xTickAdjust );

set ( g, 'XTickLabel', xTickLabel );

set ( g, 'YTick', yTickAdjust );

set ( g, 'YTickLabel', yTickLabel );

% Optionally return a handle and or scale factors

if ( nargout == 3 )

varargout{1} = h;

varargout{2} = xscale;

varargout{3} = yscale;

elseif ( nargout == 2)

varargout{1} = xscale;

varargout{2} = yscale;

elseif ( nargout == 1)

varargout{1} = h;

end

K SJohn D'ErricoNice. This could be useful for someone who just wishes to visualize how a big set of data is distributed.

Good help. Error checks. I like that the author even checks for the number of output arguments to decide if the handle is returned. All well done.

Daniel ArmyrI found a first bug in the script. The function does not handle NaNs or Infs in a nice way. For now, as a work-around, make sure you remova any NaNs or Infs before passing data to the function.