File Exchange

image thumbnail

SCATTERCLOUD

version 1.2.0.1 (2.38 KB) by

Scatterplot over a density cloud.

16 Downloads

Updated

View License

SCATTERCLOUD creates a scatterplot from X and Y data sets, and overlays it on top of a density plot of the same data.
From the help:

SCATTERCLOUD display density of scatter data

SCATTERCLOUD(X,Y) creates a scatterplot of X and Y, displayed over a surface representing the smoothed density of the points. The density is determined with a 2D histogram, using 25 equally spaced bins in both directions.

SCATTERCLOUD(X,Y,N) uses N equally spaced bins.

SCATTERCLOUD(X,Y,N,L) uses L as a parameter to the smoothing algorithm. Defaults to 1. Larger values of L lead to a smoother density, but a worse fit to the original data.

SCATTERCLOUD(X,Y,N,L,CLM) uses CLM as the color/linestyle/marker for the scatter plot. Defaults to 'k+'.
SCATTERCLOUD(X,Y,N,L,CLM,CMAP) uses CMAP as the figure's colormap. The default is 'flipud(gray(256))'.

H = SCATTERCLOUD(...) returns the handles for the surface and line objects created.

References:
Eilers, Paul H. C. & Goeman, Jelle J. (2004). Enhancing scatterplots with smoothed densities. Bioinformatics 20(5), 623-628.

Comments and Ratings (22)

David

David (view profile)

Not usable... when I attempt to unzip I get the message "Archive is not readable. Would you like to try a password?"

Luis Oviedo

Ire

Ire (view profile)

This works great! My only problem is that when I change the axis to a logarithmic scale, it correctly readjusts the scatter plot but the image/cloud plot moves to the upper right corner of the figure instead of staying directly beneath the respective points. Any ideas why and how to fix this?

Yung-Yeh

Yung-Yeh (view profile)

Very nice and decent code for displaying scatterplot in the 2D map with color for empirical distribution. The counting part was efficient only if you have a larger dataset but set a smaller # of bins. If you increase bin to larger than certain value it starts to get slowly even you have smaller dataset. I can't think of anything better but use an alternative approach for the condition with smaller dataset, and switch to the original method if the # of bin is actually smaller.

Following is my approach, similar to BLu's but with better decision switching between two methods. BTW, the centering bug was fixed in this version based on Thomas's post so length(C) == n

% do counts
if numel(x) < n^2
% New method
binIntX = diff(limitX)/(n-1);
binIntY = diff(limitX)/(n-1);
for idx = 1:numel(x)
idxX = min([round(x(idx)/binIntX)+1 numX]);
idxY = min([round(y(idx)/binIntY)+1 numY]);
C(idxY,idxX) = C(idxY,idxX) + 1;
end
else
% Old method
for i = 1:numY-1
for j = 1:numX-1
C(i,j) = length(find(x >= xEdges(j) & x < xEdges(j+1) &...
y >= yEdges(i) & y < yEdges(i+1)));
end
end
end

Nic

Nic (view profile)

Julien

Julien (view profile)

Very useful function.

Thomas

Thomas (view profile)

Centering bug:
I've symmetric data, but the cloud is not symmetric! Matt's solution does not solve the problem only by shifting the cloud. Debugging the code I found the C is non-symmetric padded by zeros and therefore not smoothed symmetric!

Solution:

C = zeros(numY-1,numX-1);

[...]

s = surf(linspace(minX,maxX,n),...
linspace(minY,maxY,n),...
zeros(numY-1,numX-1),C,...
'EdgeColor','none',...
'FaceColor','interp');

J G

J G (view profile)

This would be really useful for scatter3 too, any suggestions how I could do that?

J G

J G (view profile)

Very cool function. Can it be done on a 3D scatter plot?

BLu

BLu (view profile)

It works well for small datasets and small n (such as 50).
I added a none option, as suggested above and changed the centering problem as suggested above.
I tried a large dataset and n=200, and have to terminate the calculation because it is taking too long. The code used in calculating the counts is not the most efficient. cloudPlot has a much more efficient algorithm. I made modification based on cloudPlot.

error(nargchk(2,6,nargin),'struct');

x = x(:);
y = y(:);

%new code
pointSelect = isinf(x) | isnan(x) | isinf(y) | isnan(y);
x = x(~pointSelect);
y = y(~pointSelect);
%new code ends

if length(x) ~= length(y)
error('SCATTERCLOUDDataVectorSizesDoNotMatch','The number of elements in x and y do not match')
end

if nargin < 6
%cmap = flipud(gray(256));
cmap='default';
end

if nargin < 5
%clm = 'k+';
clm='none';
end

if nargin < 4
l = 1;
end

if nargin < 3
n = 25;
end

% min/max of x and y
minX = min(x);
maxX = max(x);
minY = min(y);
maxY = max(y);

% edge locations
xEdges = linspace(minX,maxX,n);
yEdges = linspace(minY,maxY,n);

% shift edges
xDiff = xEdges(2) - xEdges(1);
yDiff = yEdges(2) - yEdges(1);
xEdges = [-Inf, xEdges(2:end) - xDiff/2, Inf];
yEdges = [-Inf, yEdges(2:end) - yDiff/2, Inf];

% number of edges
numX = numel(xEdges);
numY = numel(yEdges);

%new code
xBinIndex = floor((x - minX+xDiff/2)/xDiff)+1;
yBinIndex = floor((y - minY+yDiff/2)/yDiff)+1;
%new code ends

% hold counts
C = zeros(numY,numX);

% do counts
%new code
for i=1:numel(x)
C(yBinIndex(i),xBinIndex(i))=C(yBinIndex(i),xBinIndex(i))+1;
end
%new code ends

%for i = 1:numY-1
% for j = 1:numX-1
% C(i,j) = sum(x >= xEdges(j) & x < xEdges(j+1) &...
% y >= yEdges(i) & y < yEdges(i+1));
% end
% disp(num2str(i));
%end

% get rid of Infs from the edges
xEdges = [xEdges(2) - xDiff,xEdges(2:end-1), xEdges(end-1) + xDiff] + xDiff/2;
yEdges = [yEdges(2) - yDiff,yEdges(2:end-1), yEdges(end-1) + yDiff] + yDiff/2;

%xEdges = [xEdges(2) - xDiff,xEdges(2:end-1), xEdges(end-1) + xDiff];
%yEdges = [yEdges(2) - yDiff,yEdges(2:end-1), yEdges(end-1) + yDiff];

% smooth the density data, in both directions.
C = localSmooth(localSmooth(C,l)',l)';

% create the graphics
ax = newplot;
s = surf(xEdges,yEdges,zeros(numY,numX),C,...
'EdgeColor','none',...
'FaceColor','interp');
view(ax,2);
colormap(ax,cmap);
grid(ax,'off');
holdstate = get(ax,'NextPlot');
set(ax,'NextPlot','add');

if ~strcmpi(clm,'none')
p = plot(x,y,clm);
end

axis(ax,'tight');
set(ax,'NextPlot',holdstate)

% outputs
if nargout
h = [s;p];
end

function B = localSmooth(A,L)
r = size(A,1);
I = eye(r);
D1 = diff(I);
D2 = diff(I,2);
B = (I + L ^ 2 * D2' * D2 + 2 * L * D1' * D1) \ A;

J G

J G (view profile)

works great thanks!

Matt Caywood

Matt Caywood (view profile)

To fix the glaring centering bug, change these lines from:

xEdges = [xEdges(2) - xDiff,xEdges(2:end-1), xEdges(end-1) + xDiff];
yEdges = [yEdges(2) - yDiff,yEdges(2:end-1), yEdges(end-1) + yDiff];

TO:

xEdges = [xEdges(2) - xDiff,xEdges(2:end-1), xEdges(end-1) + xDiff] + xDiff/2;
yEdges = [yEdges(2) - yDiff,yEdges(2:end-1), yEdges(end-1) + yDiff] + yDiff/2;

In the future, if there's a "simple" fix, why not post it and be helpful?

Hi, Ruello.
I haven't looked at this recently, but I believe that the point ended up being plotted at the 'bottom left' of each rectangular region over which the data points were summed.
—DIV

David,
could you please elaborate on "Output is biased toward the origin" and what fix did you make?

PS: Ialso fixed clm='none' issue... but I am quite puzzled by the fact that I was sure it used to work... until yesterday!
This seems an issue in the built-in function plot :
>> plot(1:10,1:10,'none')
??? Error using ==> plot
Error in color/linetype argument
!!! 'none' is listed as a valid LineStyle property, or am I wrong?!?!?
In fact:
>> p=plot(1:10,1:10);
>> set(p,'LineStyle','none')
is OK... 0_0
Sorry if this is the wrong place to post...

Ruello

Output is biased toward the origin. A fix is fairly simple. It is also useful to be able to accept CLM = 'none'. I have made those two changes in my copy of the code.

Pearl

Pearl (view profile)

Mark Tut

Thank you

Gotzon Basterretxea

Kay Hamacher

Excatly what I needed. Thanks!

Erik Benkler

very useful m-file.

paul naude

Just what I was looking for! Now, how can one change the shade to a cold-hot colour?

Updates

1.2.0.1

Updated license

1.2

Add reference to journal article.

MATLAB Release
MATLAB 6.5 (R13)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video