quires

2 views (last 30 days)
ramya
ramya on 15 Jul 2011
with a given data by using k means i have got some clusters ..how do i know exact clusters that will be formed from the given data....is mean square error used to find out the number of clusters which has to be formed..
  2 Comments
Image Analyst
Image Analyst on 16 Jul 2011
What does this mean "how do i know exact clusters that will be formed from..."? You know because you tell it how many clusters and it tells you info on what points went into what clusters. Do you rather mean "the exact *number* of clusters..." like Walter is assuming? Or do you know that and want to know something else about them, like which class each data point belongs to? Or maybe something else? And what is "quires"? Maybe a good English speaking friend can proofread your question to clarify it for us, because it's ambiguous and unclear to me now.
Walter Roberson
Walter Roberson on 16 Jul 2011
Ramya indicated that the data is "blind source" without prior knowledge; that indicates to me that Ramya would not know before hand how many clusters are appropriate to the problem. Ramya might be having other difficulties as well, but appropriate number of clusters is the first problem.
"quires" is a misspelling of the English "quiries".

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 15 Jul 2011
No, kmeans() requires that you tell it the exact number of clusters you want to use.
If you do not know what the "best" number of clusters is for your purpose (a situation which is fairly common, really), then kmeans() by itself is not a suitable algorithm. There are a number of different algorithms which attempt to find the "best" number of clusters (in some sense of "best"), with different tradeoffs for the algorithms. Mathworks does not supply any of these algorithms, but you might be able to find something suitable in the MATLAB File Exchange.
  3 Comments
Walter Roberson
Walter Roberson on 15 Jul 2011
Sorry, that doesn't change the fact that the kmeans requires that you tell it exactly how many clusters you want. Mathworks provides the source for kmeans: you can read it yourself to verify this.
>> kmeans(rand(50,3))
??? Error using ==> kmeans at 121
At least two input arguments required.
>> kmeans(rand(50,3),[])
??? Error using ==> kmeans at 193
You must specify the number of clusters, K.
>> kmeans(rand(50,3),{})
??? Error using ==> kmeans at 193
You must specify the number of clusters, K.
>> kmeans(rand(50,3),'optimizeclusters')
??? Error using ==> kmeans at 278
X must be a positive integer value.
>> kmeans(rand(50,3),[3 4 5])
??? Error using ==> kmeans at 278
X must be a positive integer value.
Oleg Komarov
Oleg Komarov on 16 Jul 2011
It's absolutely not trivial to determine the best number of clusters, you may dedicate a PhD dissertation on that.
Also http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

Sign in to comment.


ramya
ramya on 19 Jul 2011
sorry for the spelling mistake...my question is how do i know the number of clusters which has to be formed when given a blind source... i have used k-meansclustering algorithm .. thank you
  1 Comment
Walter Roberson
Walter Roberson on 19 Jul 2011
I do doubt there is any way to determine that automatically. I would have to think more about the situation in order to think of a *proof* that in the general case it cannot be done automatically.
See the wikipedia link Oleg provided for some information about some approximations that people have come up with for various purposes.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!