Path: news.mathworks.com!not-for-mail
From: "Sven" <sven.holcombe@gmail.deleteme.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Clustering a curve of 3d points
Date: Wed, 17 Jun 2009 15:48:01 +0000 (UTC)
Organization: University of Michigan
Lines: 41
Message-ID: <h1b37h$9r3$1@fred.mathworks.com>
References: <h190uu$i3l$1@fred.mathworks.com> <h1auql$f0a$1@fred.mathworks.com>
Reply-To: "Sven" <sven.holcombe@gmail.deleteme.com>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1245253681 10083 172.30.248.38 (17 Jun 2009 15:48:01 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 17 Jun 2009 15:48:01 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1326470
Xref: news.mathworks.com comp.soft-sys.matlab:548349


> > I'm trying to group a set of points distributed in 3d space. The points 
> > come from a set of points that contain both true and false positives, and 
> > I'm trying to weed out the false positives.
> > The true positives should essentially describe a long curve through my 
> > space.
> 
> Sven, there may well be a better way, but I do have one idea. Cluster 
> analysis seeks to group points into clusters based on the distance between 
> them. Single-linkage clustering seeks to avoid gaps at the expense of 
> perhaps having a long cluster. So it might be able to find curves like the 
> one you describe.
> 
> Try this on your data.
> 
> % Try single-linkage clustering in attempt to keep nearby points together
> figure(1)
> Y = pdist(found_pts(:,1:3));
> Z = linkage(Y,'single');
> dendrogram(Z,101,'colorthreshold',30);
> t = cluster(Z,'cutoff',30,'criterion','distance')
> 
> % Plot points color-coded by cluster assignment
> figure(2), hold on
> cols = jet(max(t));
> for i=1:max(t)
>     idxs = t==i;
>     plot3(found_pts(idxs,1), found_pts(idxs,2), 
> found_pts(idxs,3),'.','Color',cols(i,:))
> end

Thanks for your help there Tom. Unfortunately I've only got the image processing toolbox at my disposal right now.
It looks like the stats toolbox could be very handy though.
Any ideas that don't rely on the stats toolbox?

For example, since my points are distributed along Z (with no points having the same Z location), the following code effectively weeds out any single outliers (ie, points which differ significantly in XY location from the previous point). Any further ideas?

idxs = abs(prod(diff(found_pts(:,1:2)),2)) < 20^2;
found_pts = found_pts(idxs,:);

Cheers,
Sven.