Davies-Bouldin criterion clustering evaluation object
DaviesBouldinEvaluation is an object consisting of sample data,
clustering data, and Davies-Bouldin criterion values used to evaluate the optimal number
of clusters. Create a Davies-Bouldin criterion clustering evaluation object using
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'KList',[1:5]specifies to test 1, 2, 3, 4, and 5 clusters to find the optimal number.
Clustering algorithm used to cluster the input data, stored
as a valid clustering algorithm name or function handle. If the clustering
solutions are provided in the input,
Name of the criterion used for clustering evaluation, stored as a valid criterion name.
Criterion values corresponding to each proposed number of clusters
List of the number of proposed clusters for which to compute criterion values, stored as a vector of positive integer values.
Logical flag for excluded data, stored as a column vector of
logical values. If
Number of observations in the data matrix
Optimal number of clusters, stored as a positive integer value.
Optimal clustering solution corresponding to
Data used for clustering, stored as a matrix of numerical values.
Evaluate the optimal number of clusters using the Davies-Bouldin clustering evaluation criterion.
Generate sample data containing random numbers from three multivariate distributions with different parameter values.
rng('default'); % For reproducibility mu1 = [2 2]; sigma1 = [0.9 -0.0255; -0.0255 0.9]; mu2 = [5 5]; sigma2 = [0.5 0 ; 0 0.3]; mu3 = [-2, -2]; sigma3 = [1 0 ; 0 0.9]; N = 200; X = [mvnrnd(mu1,sigma1,N);... mvnrnd(mu2,sigma2,N);... mvnrnd(mu3,sigma3,N)];
Evaluate the optimal number of clusters using the Davies-Bouldin criterion. Cluster the data using
E = evalclusters(X,'kmeans','DaviesBouldin','klist',[1:6])
E = DaviesBouldinEvaluation with properties: NumObservations: 600 InspectedK: [1 2 3 4 5 6] CriterionValues: [NaN 0.4663 0.4454 0.8316 1.0444 0.9236] OptimalK: 3
OptimalK value indicates that, based on the Davies-Bouldin criterion, the optimal number of clusters is three.
Plot the Davies-Bouldin criterion values for each number of clusters tested.
The plot shows that the lowest Davies-Bouldin value occurs at three clusters, suggesting that the optimal number of clusters is three.
Create a grouped scatter plot to visually examine the suggested clusters.
The plot shows three distinct clusters within the data: Cluster 1 is in the lower-left corner, cluster 2 is in the upper-right corner, and cluster 3 is near the center of the plot.
The Davies-Bouldin criterion is based on a ratio of within-cluster and between-cluster distances. The Davies-Bouldin index is defined as
where Di,j is the within-to-between cluster distance ratio for the ith and jth clusters. In mathematical terms,
is the average distance between each point in the ith cluster and the centroid of the ith cluster. is the average distance between each point in the jth cluster and the centroid of the jth cluster. is the Euclidean distance between the centroids of the ith and jth clusters.
The maximum value of Di,j represents the worst-case within-to-between cluster ratio for cluster i. The optimal clustering solution has the smallest Davies-Bouldin index value.
 Davies, D. L., and D. W. Bouldin. “A Cluster Separation Measure.” IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. PAMI-1, No. 2, 1979, pp. 224–227.