Silhouette criterion clustering evaluation object
SilhouetteEvaluation is an object
consisting of sample data, clustering data, and
silhouette criterion values used to evaluate the
optimal number of data clusters. Create a
silhouette criterion clustering evaluation object
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'KList',[1:5],'Distance','cityblock'specifies to test 1, 2, 3, 4, and 5 clusters using the city block distance metric.
Clustering algorithm used to cluster the input data, stored
as a valid clustering algorithm name or function handle. If the clustering
solutions are provided in the input,
Prior probabilities for each cluster, stored as valid prior probability name.
Silhouette values corresponding to each
proposed number of clusters in
Name of the criterion used for clustering evaluation, stored as a valid criterion name.
Criterion values corresponding to each proposed number of clusters
Distance metric used for clustering data, stored as a valid distance metric name.
List of the number of proposed clusters for which to compute criterion values, stored as a vector of positive integer values.
Logical flag for excluded data, stored as a column vector of
logical values. If
Number of observations in the data matrix
Optimal number of clusters, stored as a positive integer value.
Optimal clustering solution corresponding to
Data used for clustering, stored as a matrix of numerical values.
Evaluate the optimal number of clusters using the silhouette clustering evaluation criterion.
Generate sample data containing random numbers from three multivariate distributions with different parameter values.
rng('default'); % For reproducibility mu1 = [2 2]; sigma1 = [0.9 -0.0255; -0.0255 0.9]; mu2 = [5 5]; sigma2 = [0.5 0 ; 0 0.3]; mu3 = [-2, -2]; sigma3 = [1 0 ; 0 0.9]; N = 200; X = [mvnrnd(mu1,sigma1,N);... mvnrnd(mu2,sigma2,N);... mvnrnd(mu3,sigma3,N)];
Evaluate the optimal number of clusters using the silhouette criterion. Cluster the data using
E = evalclusters(X,'kmeans','silhouette','klist',[1:6])
E = SilhouetteEvaluation with properties: NumObservations: 600 InspectedK: [1 2 3 4 5 6] CriterionValues: [NaN 0.8055 0.8551 0.7155 0.6071 0.6232] OptimalK: 3
OptimalK value indicates that, based on the silhouette criterion, the optimal number of clusters is three.
Plot the silhouette criterion values for each number of clusters tested.
The plot shows that the highest silhouette value occurs at three clusters, suggesting that the optimal number of clusters is three.
Create a grouped scatter plot to visually examine the suggested clusters.
The plot shows three distinct clusters within the data: Cluster 1 is in the lower-left corner, cluster 2 is in the upper-right corner, and cluster 3 is near the center of the plot.
The silhouette value for each point is a measure of how similar that point is to points in its
own cluster, when compared to points in other clusters. The silhouette value
Si for the
ith point is defined as
Si = (bi-ai)/ max(ai,bi)
ai is the average distance from the
point to the other points in the same cluster as
bi is the minimum average distance from the
point to points in a different cluster, minimized over clusters.
The silhouette value ranges from
1. A high
silhouette value indicates that
i is well matched to its own
cluster, and poorly matched to other clusters. If most points have a high silhouette
value, then the clustering solution is appropriate. If many points have a low or
negative silhouette value, then the clustering solution might have too many or too
few clusters. You can use silhouette values as a clustering evaluation criterion
with any distance metric.
 Kaufman L. and P. J. Rouseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Hoboken, NJ: John Wiley & Sons, Inc., 1990.
 Rouseeuw, P. J. “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.” Journal of Computational and Applied Mathematics. Vol. 20, No. 1, 1987, pp. 53–65.