Asked by John
on 16 Nov 2011

Hi there,

I have question in relation to k means clustering. Say I created two clusters from data. For example using this code:

X = [randn(100,2)+ones(100,2);... randn(100,2)-ones(100,2)]; opts = statset('Display','final');

[idx,ctrs] = kmeans(X,2,... 'Distance','city',... 'Replicates',5,... 'Options',opts); plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',12) hold on plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',12) plot(ctrs(:,1),ctrs(:,2),'kx',... 'MarkerSize',12,'LineWidth',2) plot(ctrs(:,1),ctrs(:,2),'ko',... 'MarkerSize',12,'LineWidth',2) legend('Cluster 1','Cluster 2','Centroids',... 'Location','NW')

My question is, if you collect more data can you assign it to each of the two clusters that have already been formed, or do you have to cluster all of the data again?

If it is possible, how would you do it?

Thank you

Answer by Wayne King
on 16 Nov 2011

k-means is an unsupervised learning algorithm that is sensitive to the number of clusters you choose AND to the initial start centers. I would say that you would need to cluster the data again.

