MATLAB Answers

Initial centroids for K-means clustering

68 views (last 30 days)
Salad Box
Salad Box on 16 Sep 2019
Commented: Adam on 17 Sep 2019
If I have an array (i.e., 5 by 3 matrix) can serve as the initial centroids for kmeans clustering, how can I properly initialize the kmeans algorithm?
(Matlab's kmeans function has more than 600 lines of code and I have no idea how to modify it...)
The purpose of having my own initial centroids rather than have them randomly generated in the kmeans function is to remove the randomness in the outputs.
P.s. Python has the answer to it but I don't know Python.

  1 Comment

Adam
Adam on 17 Sep 2019
You should always read the documentation before the code. The 'Start' option gives you the option to input your own initial cluster centres.
I always suggest using your embedded help though via
doc kmeans
and clicking on the 'Name','Value' hyperlink in the 2nd function signature to take you to the list of possible (Name,Value) pairs that are supported. If you always use the latest version of Matlab the online help is fine though.

Sign in to comment.

Answers (1)

KALYAN ACHARJYA
KALYAN ACHARJYA on 17 Sep 2019
Edited: KALYAN ACHARJYA on 17 Sep 2019
Before I share the helpful link, I requested you to watch the Andrew Ng. lecture on Random Initialization of K menas (Machine Learning).
He suggests to avoid k-means stuck in local minima or ensure the optimize K-menas, choose multiple random initailizations.
Manual Initialization

  2 Comments

Salad Box
Salad Box on 17 Sep 2019
Thanks for your answers Kalyan. I do appreciate that.
However,
AndewNg's video only gives some help on when k-means gets stuck on local optimal. His suggestion was to use 'multiple iteration' to better find global optimal rather than local optimal based on the calculation of cost function, choosing the centroids with minimum cost function and record that centroids. That still remains my problem unsolved. If I run the k-means again with 100 new iterations, the output in most cases will be slightly different compared to the first running of k-means with initial 100 iterations.
I need to fix the issue and my request is that everytime when I run the k-means, the output needs to be the same. That's why with my prepared initial centroids, running k-means and moving centroids at each step during k-means, theoretically I should get the same output at the end. I have other variables/parameters to look at during my research, I can't let randomness in the output of k-means be one of my variable. I need to remove this randomness. Hope that is understandable.
The second link in your answer is on 'how to set initial centroids for k means'. However, I have already done that in my way. It is irrelavant to my question.
My question is:
Once I have an array as my initial centroids, how do I embed them into Matlab's own k-means function?
Hope my question is clear.
Can anyone help directly to this question please?
Adam
Adam on 17 Sep 2019
As I added in a comment above, the Matlab help is always the first place to go. This shows how you can do this.

Sign in to comment.

Sign in to answer this question.