K Means Clustering Question
26 views (last 30 days)
Show older comments
Hi,
I have been trying to run k-means clustering in Matlab by setting a seed (rng). A few times it goes through without issue, but sometimes when I run the k-means with the same rng, i get the error "Warning: Failed to converge in 100 iterations."
I am working why I get the error message sporadically. Given that I set the rng, I would expect it to work fine if it did in the past?
Thanks,
0 Comments
Answers (1)
Adithya Addanki
on 1 Dec 2015
Edited: Adithya Addanki
on 1 Dec 2015
Hi Munaf,
Please confirm the release of MATLAB you are using if you are comparing the results between different releases. Also, please find the release notes and the changes incorporated into "kmeans" and related functions in the link below: http://www.mathworks.com/help/stats/release-notes.html
It may be possible that the algorithm is converging for the default number of iterations (100). Please look at the "MaxIter" parameter for the "kmeans" function to increase the number of iterations.
For instance:
[idx,C,sumd,D] = kmeans(X,20,'MaxIter',10000)
I understand the usage of seed in "rng" is to produce predictable sequence of numbers. Let us refer to a simple example (first example from the link below):
%load sample data
load fisheriris
X = meas(:,3:4);
figure;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';
% usage of rng with seed = 1
rng(1);
[idx,C] = kmeans(X,3);
rng(1);
[idx2,C2] = kmeans(X,3);
[idx3,C3] = kmeans(X,3);
rng(1);
[idx4,C4] = kmeans(X,3);
[idx5,C5] = kmeans(X,3);
[idx6,C6] = kmeans(X,3);
[idx7,C7] = kmeans(X,3);
Now, if you notice the centroids returned from the above commands C,C2 and C4 will be the same as you have set the seed each time before calling the "kmeans" function (Case 1). Whereas, C3, C5, C6 and C7 will be different as the sequence generated by "rng" is not set to use the seed again (Case 2).
In the second case it may be possible that the number of iterations required is higher than the default (Many factors come into picture: size of the data, number of clusters and underlying algorithm used)
I hope this answers your question.
Thanks,
Adithya
1 Comment
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!