What's the difference between the two arguments in the kmeans function: MaxIter and Replicates?

5 views (last 30 days)
ABC EFD on 30 Oct 2017
Edited: Deepa Gupta on 29 Mar 2020
Do they both mean how many times a new centroid is to be found?

Answers (1)

Deepa Gupta
Deepa Gupta on 29 Mar 2020
Edited: Deepa Gupta on 29 Mar 2020
I somewhat have the same question. My guess is that for the search of centroid, replicate=r (say r is the number of replicates) re-initializes the starting point with every new run, whereas Maxiter's new iteration run still uses the same random seed initialization/starting point maybe.
Most importantly, I think the minimum sumd (sum of differences between centroid and cluster's populants) or in other words the best solution is chosen from the r runs when replicates is mentioned and this may not be necessary with MaxIter's multiple iterations resultant runs given the MATLAB documentation's kmeans description.
I think that's the best way to understand this. Having said that, I am myself looking/open to more discussion on this. Given the above, it may make sense to perhaps assign high number of replicates although computational load and corresponding time consumption could be the cost. Best thing to do would be to type open kmeans on the command line in matlab and check out the code to investigate the above for surity. I was myself doing that but ran out of time due to a deadline. I may revisit later but for now I am just running multiple runs with my data to see the best solution and then I take that r and MaxIter parameters for the rest of my similar natured/same domain data. At the end of the day, one should understand that multiple solutions are very much possible with clustering analysis but they still help human if dealing with large samples to get some idea i.e. centroids/prototypes. With that concept and tradeoff in mind, one could use kmeans.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!