Use an input matrix of proposed clustering
solutions to evaluate the optimal number of clusters.
Load the sample data.
load fisheriris;
The data contains length and width measurements from the sepals
and petals of three species of iris flowers.
Use kmeans to create an input matrix
of proposed clustering solutions for the sepal length measurements,
using 1, 2, 3, 4, 5, and 6 clusters.
clust = zeros(size(meas,1),6);
for i=1:6
clust(:,i) = kmeans(meas,i,'emptyaction','singleton',...
'replicate',5);
end
Each row of clust corresponds to one sepal
length measurement. Each of the six columns corresponds to a clustering
solution containing 1 to 6 clusters.
Evaluate the optimal number of clusters using the Calinski-Harabasz
criterion.
eva = evalclusters(meas,clust,'CalinskiHarabasz')
eva =
CalinskiHarabaszEvaluation with properties:
NumObservations: 150
InspectedK: [1 2 3 4 5 6]
CriterionValues: [NaN 513.9245 561.6278 530.7658 459.5058 473.6577]
OptimalK: 3
The OptimalK value indicates that, based
on the Calinski-Harabasz criterion, the optimal number of clusters
is three.