This is a tool for K-means clustering. After trying several different ways to program, I got the conclusion that using simple loops to perform distance calculation and comparison is most efficient and accurate because of the JIT acceleration in MATLAB.
The code is very simple and well documented, hence is suitable for beginners to learn k-means clustering algorithm.
Numerical comparisons show that this tool could be several times faster than kmeans in Statistics Toolbox.
Does the code support 3d data?
Although not a perfect way to solve the above-mentioned issue, adding the following two lines after the update of the centroids solved the problem in my case:
idnan = find(isnan(c(:,1)));
c(idnan,:) = X(randi(n,length(idnan),1),:);
Pretty fast indeed!
However, the number of cluster is sometimes not respected. The algorithm yields a lower number of clusters, replacing additional centroid by NaN. This can be inconvenient.
The code is very nice and well documented. In some cases, however, the clusters are not properly identified if no initial centroid vectors are provided. This could be improved by automatically trying a small number of different random initial guesses and chosing the configuration which yields the smallest sum of distance between points and centroids.
this stuff works and examples/comparisons are given