quires
2 views (last 30 days)
Show older comments
with a given data by using k means i have got some clusters ..how do i know exact clusters that will be formed from the given data....is mean square error used to find out the number of clusters which has to be formed..
2 Comments
Image Analyst
on 16 Jul 2011
What does this mean "how do i know exact clusters that will be formed from..."? You know because you tell it how many clusters and it tells you info on what points went into what clusters. Do you rather mean "the exact *number* of clusters..." like Walter is assuming? Or do you know that and want to know something else about them, like which class each data point belongs to? Or maybe something else? And what is "quires"? Maybe a good English speaking friend can proofread your question to clarify it for us, because it's ambiguous and unclear to me now.
Walter Roberson
on 16 Jul 2011
Ramya indicated that the data is "blind source" without prior knowledge; that indicates to me that Ramya would not know before hand how many clusters are appropriate to the problem. Ramya might be having other difficulties as well, but appropriate number of clusters is the first problem.
"quires" is a misspelling of the English "quiries".
Answers (2)
Walter Roberson
on 15 Jul 2011
No, kmeans() requires that you tell it the exact number of clusters you want to use.
If you do not know what the "best" number of clusters is for your purpose (a situation which is fairly common, really), then kmeans() by itself is not a suitable algorithm. There are a number of different algorithms which attempt to find the "best" number of clusters (in some sense of "best"), with different tradeoffs for the algorithms. Mathworks does not supply any of these algorithms, but you might be able to find something suitable in the MATLAB File Exchange.
3 Comments
Walter Roberson
on 15 Jul 2011
Sorry, that doesn't change the fact that the kmeans requires that you tell it exactly how many clusters you want. Mathworks provides the source for kmeans: you can read it yourself to verify this.
>> kmeans(rand(50,3))
??? Error using ==> kmeans at 121
At least two input arguments required.
>> kmeans(rand(50,3),[])
??? Error using ==> kmeans at 193
You must specify the number of clusters, K.
>> kmeans(rand(50,3),{})
??? Error using ==> kmeans at 193
You must specify the number of clusters, K.
>> kmeans(rand(50,3),'optimizeclusters')
??? Error using ==> kmeans at 278
X must be a positive integer value.
>> kmeans(rand(50,3),[3 4 5])
??? Error using ==> kmeans at 278
X must be a positive integer value.
Oleg Komarov
on 16 Jul 2011
It's absolutely not trivial to determine the best number of clusters, you may dedicate a PhD dissertation on that.
Also http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set
ramya
on 19 Jul 2011
1 Comment
Walter Roberson
on 19 Jul 2011
I do doubt there is any way to determine that automatically. I would have to think more about the situation in order to think of a *proof* that in the general case it cannot be done automatically.
See the wikipedia link Oleg provided for some information about some approximations that people have come up with for various purposes.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!