How to calculate within group sum of squares for kmeans ?

42 views (last 30 days)
I have data set with 318 data points and 11 attributes. So my matrix is 318*11. I am trying to find the best number of cluster required for my data set. I have started from thumb rule which gives me around sqrt(312/2)~ 12. So i started from 12.I am using default kmeans function of matlab.For example my data is stored in X which 318*11 matrix. Now I run kmeans like this .
kmeans(X,12)
How to calculate sum of square to find optimum number of cluster for my data set like this ?

Accepted Answer

dpb
dpb on 21 Jun 2015
Have to run kmeans over the range of number of clusters saving the optional sumd (third) output parameter for each case. The total sum of distances is then
sum(sumd)
for each run. Probably simplest is to just use a loop...
nClusters=15; % pick/set number of clusters your going to use
totSum=zeros(nClusters); % preallocate the result
for i=1:nClusters
[~,~,sumd]=kmeans(X,i);
totSum(i)=sum(sumd);
end
plot(totSum) % plot of totals versus number (same as index)
  3 Comments
dpb
dpb on 21 Jun 2015
That would, I believe, be totally dependent upon the characteristics of the data set. If there were no real groupings then it would simply be measuring the variance between means (roughly) of bins which clearly will continue to decrease as the size of the bin gets smaller. What if you reduce bins instead of continuing to increase them? Have you looked at the plot of the results and the silhouette plot to get a visual "feel" for the data?
Sidra Aleem
Sidra Aleem on 1 Jul 2018
Edited: Sidra Aleem on 1 Jul 2018
Bikram Kawan, your post is bit old. However, elbow method is bit ambiguous. You can try Average silhouette method to get the optimal clusters. You can take help from link below. https://www.mathworks.com/help/stats/clustering.evaluation.silhouetteevaluation-class.html

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!