<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/253886</link>
    <title>MATLAB Central Newsreader - Clustering a curve of 3d points</title>
    <description>Feed for thread: Clustering a curve of 3d points</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Tue, 16 Jun 2009 20:57:02 -0400</pubDate>
      <title>Clustering a curve of 3d points</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/253886#657929</link>
      <author>Sven </author>
      <description>Hi all,&lt;br&gt;
&lt;br&gt;
I'm trying to group a set of points distributed in 3d space. The points come from a set of points that contain both true and false positives, and I'm trying to weed out the false positives.&lt;br&gt;
The true positives should essentially describe a long curve through my space.&lt;br&gt;
&lt;br&gt;
Can anyone help me describe the 'type' of problem I'm facing here, so I can search for the right keywords? Even better, some hints about how to work towards a solution would be great.&lt;br&gt;
&lt;br&gt;
I've included the basic data below. Just copy the (n by 4) matrix below into a variable called 'found_pts', and then run the simple code snippet below it. You'll see a plot of my points, and at about mid-level (in Z), you'll see the long curve of true positives I'm trying to extract (as automatically as possible).&lt;br&gt;
The first 3 columns simply give X, Y, and Z locations. The 4th column can be ignored for the moment.&lt;br&gt;
&lt;br&gt;
Thanks,&lt;br&gt;
Sven.&lt;br&gt;
&lt;br&gt;
found_pts = []; % Copy from below&lt;br&gt;
&lt;br&gt;
290       143      -700     0.565&lt;br&gt;
290       142      -695     0.499&lt;br&gt;
169       180      -690     0.429&lt;br&gt;
255       174      -685     0.553&lt;br&gt;
261       176      -680     0.538&lt;br&gt;
216       150      -675     0.587&lt;br&gt;
239       149      -670     0.582&lt;br&gt;
289       149      -665     0.569&lt;br&gt;
169       151      -660     0.576&lt;br&gt;
180       183      -655     0.547&lt;br&gt;
178       183      -650     0.618&lt;br&gt;
156       146      -645      0.58&lt;br&gt;
154       146      -640     0.638&lt;br&gt;
151       146      -635     0.625&lt;br&gt;
147       146      -630     0.661&lt;br&gt;
144       144      -625     0.642&lt;br&gt;
262       130      -620     0.678&lt;br&gt;
263       129      -615     0.774&lt;br&gt;
264       128      -610     0.647&lt;br&gt;
264       127      -605     0.573&lt;br&gt;
266       125      -600     0.658&lt;br&gt;
267       125      -595     0.736&lt;br&gt;
268       123      -590     0.763&lt;br&gt;
268       122      -585     0.752&lt;br&gt;
268       122      -580     0.729&lt;br&gt;
269       120      -575     0.699&lt;br&gt;
269       120      -570     0.708&lt;br&gt;
270       121      -565     0.688&lt;br&gt;
271       122      -560     0.635&lt;br&gt;
201       117      -555     0.646&lt;br&gt;
201       117      -550     0.654&lt;br&gt;
201       116      -545     0.656&lt;br&gt;
201       115      -540      0.66&lt;br&gt;
200       114      -535     0.664&lt;br&gt;
199       115      -530     0.639&lt;br&gt;
198       115      -525     0.659&lt;br&gt;
198       116      -520     0.637&lt;br&gt;
226       173      -515     0.605&lt;br&gt;
206       114      -510     0.592&lt;br&gt;
203       112      -505     0.663&lt;br&gt;
203       109      -500     0.657&lt;br&gt;
248       162      -495     0.706&lt;br&gt;
274       164      -490     0.709&lt;br&gt;
274       165      -485     0.716&lt;br&gt;
274       166      -480      0.73&lt;br&gt;
198       108      -475     0.654&lt;br&gt;
198       107      -470      0.67&lt;br&gt;
198       106      -465     0.659&lt;br&gt;
199       104      -460     0.665&lt;br&gt;
200       103      -455     0.662&lt;br&gt;
201       102      -450     0.626&lt;br&gt;
200       104      -445     0.619&lt;br&gt;
255       188      -440     0.595&lt;br&gt;
244       123      -435     0.565&lt;br&gt;
200       102      -430     0.563&lt;br&gt;
195       166      -425     0.572&lt;br&gt;
195       167      -420     0.618&lt;br&gt;
198      97.2      -415     0.619&lt;br&gt;
122       142      -410     0.655&lt;br&gt;
264       177      -405     0.649&lt;br&gt;
245       115      -400     0.664&lt;br&gt;
248       115      -395     0.688&lt;br&gt;
195      95.8      -390     0.666&lt;br&gt;
195      94.3      -385     0.668&lt;br&gt;
195      92.1      -380     0.639&lt;br&gt;
193      91.4      -375     0.673&lt;br&gt;
145       123      -370     0.677&lt;br&gt;
192      87.8      -365     0.641&lt;br&gt;
194      87.1      -360     0.652&lt;br&gt;
193      87.8      -355     0.633&lt;br&gt;
195      86.4      -350     0.642&lt;br&gt;
195      84.9      -345      0.63&lt;br&gt;
195      83.5      -340     0.673&lt;br&gt;
194        82      -335     0.656&lt;br&gt;
193      81.3      -330     0.665&lt;br&gt;
193      81.3      -325     0.682&lt;br&gt;
193      81.3      -320     0.734&lt;br&gt;
193        82      -315      0.71&lt;br&gt;
231       156      -310     0.512&lt;br&gt;
241      89.2      -305     0.505&lt;br&gt;
237      89.2      -300     0.604&lt;br&gt;
233      88.5      -295     0.615&lt;br&gt;
230      89.2      -290     0.617&lt;br&gt;
227      89.2      -285     0.573&lt;br&gt;
224      88.5      -280     0.534&lt;br&gt;
216        90      -275     0.467&lt;br&gt;
265      86.4      -270     0.555&lt;br&gt;
190      92.9      -265     0.601&lt;br&gt;
155      93.6      -260     0.565&lt;br&gt;
262      87.1      -255     0.577&lt;br&gt;
262      87.1      -250     0.607&lt;br&gt;
263      87.1      -245     0.553&lt;br&gt;
147       165      -240     0.522&lt;br&gt;
156        95      -235     0.557&lt;br&gt;
163      96.5      -230     0.592&lt;br&gt;
214      92.9      -228     0.559&lt;br&gt;
213      93.6      -226      0.54&lt;br&gt;
213      93.6      -225     0.516&lt;br&gt;
247      91.4      -224     0.546&lt;br&gt;
245      91.4      -223     0.528&lt;br&gt;
237      92.1      -221     0.556&lt;br&gt;
&lt;br&gt;
numcols = 15;&lt;br&gt;
cols = jet(numcols);&lt;br&gt;
hist_edges = linspace(min(found_pts(:,4)), max(found_pts(:,4)), numcols);&lt;br&gt;
[N, n_bin] = histc(found_pts(:,4), hist_edges);&lt;br&gt;
bin_size = mean(diff(X))/2;&lt;br&gt;
figure, hold on&lt;br&gt;
for i=1:numcols-1&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;idxs = n_bin==i;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plot3(found_pts(idxs,1), found_pts(idxs,2), found_pts(idxs,3),'.','Color',cols(i,:))&lt;br&gt;
end&lt;br&gt;
axis tight, view(3)&lt;br&gt;
colorbar, caxis(hist_edges([1 end]))</description>
    </item>
    <item>
      <pubDate>Wed, 17 Jun 2009 14:32:53 -0400</pubDate>
      <title>Re: Clustering a curve of 3d points</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/253886#658147</link>
      <author>Tom Lane</author>
      <description>&amp;gt; I'm trying to group a set of points distributed in 3d space. The points &lt;br&gt;
&amp;gt; come from a set of points that contain both true and false positives, and &lt;br&gt;
&amp;gt; I'm trying to weed out the false positives.&lt;br&gt;
&amp;gt; The true positives should essentially describe a long curve through my &lt;br&gt;
&amp;gt; space.&lt;br&gt;
&lt;br&gt;
Sven, there may well be a better way, but I do have one idea. Cluster &lt;br&gt;
analysis seeks to group points into clusters based on the distance between &lt;br&gt;
them. Single-linkage clustering seeks to avoid gaps at the expense of &lt;br&gt;
perhaps having a long cluster. So it might be able to find curves like the &lt;br&gt;
one you describe.&lt;br&gt;
&lt;br&gt;
Try this on your data.&lt;br&gt;
&lt;br&gt;
% Try single-linkage clustering in attempt to keep nearby points together&lt;br&gt;
figure(1)&lt;br&gt;
Y = pdist(found_pts(:,1:3));&lt;br&gt;
Z = linkage(Y,'single');&lt;br&gt;
dendrogram(Z,101,'colorthreshold',30);&lt;br&gt;
t = cluster(Z,'cutoff',30,'criterion','distance')&lt;br&gt;
&lt;br&gt;
% Plot points color-coded by cluster assignment&lt;br&gt;
figure(2), hold on&lt;br&gt;
cols = jet(max(t));&lt;br&gt;
for i=1:max(t)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;idxs = t==i;&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;plot3(found_pts(idxs,1), found_pts(idxs,2), &lt;br&gt;
found_pts(idxs,3),'.','Color',cols(i,:))&lt;br&gt;
end&lt;br&gt;
&lt;br&gt;
-- Tom </description>
    </item>
    <item>
      <pubDate>Wed, 17 Jun 2009 15:48:01 -0400</pubDate>
      <title>Re: Clustering a curve of 3d points</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/253886#658188</link>
      <author>Sven </author>
      <description>&amp;gt; &amp;gt; I'm trying to group a set of points distributed in 3d space. The points &lt;br&gt;
&amp;gt; &amp;gt; come from a set of points that contain both true and false positives, and &lt;br&gt;
&amp;gt; &amp;gt; I'm trying to weed out the false positives.&lt;br&gt;
&amp;gt; &amp;gt; The true positives should essentially describe a long curve through my &lt;br&gt;
&amp;gt; &amp;gt; space.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Sven, there may well be a better way, but I do have one idea. Cluster &lt;br&gt;
&amp;gt; analysis seeks to group points into clusters based on the distance between &lt;br&gt;
&amp;gt; them. Single-linkage clustering seeks to avoid gaps at the expense of &lt;br&gt;
&amp;gt; perhaps having a long cluster. So it might be able to find curves like the &lt;br&gt;
&amp;gt; one you describe.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Try this on your data.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; % Try single-linkage clustering in attempt to keep nearby points together&lt;br&gt;
&amp;gt; figure(1)&lt;br&gt;
&amp;gt; Y = pdist(found_pts(:,1:3));&lt;br&gt;
&amp;gt; Z = linkage(Y,'single');&lt;br&gt;
&amp;gt; dendrogram(Z,101,'colorthreshold',30);&lt;br&gt;
&amp;gt; t = cluster(Z,'cutoff',30,'criterion','distance')&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; % Plot points color-coded by cluster assignment&lt;br&gt;
&amp;gt; figure(2), hold on&lt;br&gt;
&amp;gt; cols = jet(max(t));&lt;br&gt;
&amp;gt; for i=1:max(t)&lt;br&gt;
&amp;gt;     idxs = t==i;&lt;br&gt;
&amp;gt;     plot3(found_pts(idxs,1), found_pts(idxs,2), &lt;br&gt;
&amp;gt; found_pts(idxs,3),'.','Color',cols(i,:))&lt;br&gt;
&amp;gt; end&lt;br&gt;
&lt;br&gt;
Thanks for your help there Tom. Unfortunately I've only got the image processing toolbox at my disposal right now.&lt;br&gt;
It looks like the stats toolbox could be very handy though.&lt;br&gt;
Any ideas that don't rely on the stats toolbox?&lt;br&gt;
&lt;br&gt;
For example, since my points are distributed along Z (with no points having the same Z location), the following code effectively weeds out any single outliers (ie, points which differ significantly in XY location from the previous point). Any further ideas?&lt;br&gt;
&lt;br&gt;
idxs = abs(prod(diff(found_pts(:,1:2)),2)) &amp;lt; 20^2;&lt;br&gt;
found_pts = found_pts(idxs,:);&lt;br&gt;
&lt;br&gt;
Cheers,&lt;br&gt;
Sven.</description>
    </item>
  </channel>
</rss>

