Davies-Bouldin criterion clustering evaluation object
an object consisting of sample data, clustering data, and Davies-Bouldin
criterion values used to evaluate the optimal number of clusters.
Create a Davies-Bouldin criterion clustering evaluation object using
x— Input data
Input data, specified as an N-by-P matrix. N is the number of observations, and P is the number of variables.
clust— Clustering algorithm
'gmdistribution'| matrix of clustering solutions | function handle
Clustering algorithm, specified as one of the following.
|Cluster the data in |
|Cluster the data in |
|Cluster the data in |
'silhouette', you can specify a clustering algorithm
using a function handle (MATLAB). The function
must be of the form
C = clustfun(DATA,K), where
the data to be clustered, and
K is the number of
clusters. The output of
clustfun must be one of
A vector of integers representing the cluster index
for each observation in
DATA. There must be
values in this vector.
A numeric n-by-K matrix of score for n observations and K classes. In this case, the cluster index for each observation is determined by taking the largest score value in each row.
'silhouette', you can also specify
a n-by-K matrix containing the
proposed clustering solutions. n is the number
of observations in the sample data, and K is the
number of proposed clustering solutions. Column j contains
the cluster indices for each of the N points in
the jth clustering solution.
Specify optional comma-separated pairs of
Name is the argument
Value is the corresponding
Name must appear
inside single quotes (
You can specify several name and value pair
arguments in any order as
'KList',[1:5]specifies to test 1, 2, 3, 4, and 5 clusters to find the optimal number.
'KList'— List of number of clusters to evaluate
List of number of clusters to evaluate, specified as the comma-separated
pair consisting of
'KList' and a vector of positive
integer values. You must specify
a clustering algorithm name or a function handle. When
be a character vector or a function handle, and you must specify
Clustering algorithm used to cluster the input data, stored
as a valid clustering algorithm name or function handle. If the clustering
solutions are provided in the input,
Name of the criterion used for clustering evaluation, stored as a valid criterion name.
Criterion values corresponding to each proposed number of clusters
List of the number of proposed clusters for which to compute criterion values, stored as a vector of positive integer values.
Logical flag for excluded data, stored as a column vector of
logical values. If
Number of observations in the data matrix
Optimal number of clusters, stored as a positive integer value.
Optimal clustering solution corresponding to
Data used for clustering, stored as a matrix of numerical values.
|addK||Evaluate additional numbers of clusters|
|compact||Compact clustering evaluation object|
|plot||Plot clustering evaluation object criterion values|
Evaluate the optimal number of clusters using the Davies-Bouldin clustering evaluation criterion.
Generate sample data containing random numbers from three multivariate distributions with different parameter values.
rng('default'); % For reproducibility mu1 = [2 2]; sigma1 = [0.9 -0.0255; -0.0255 0.9]; mu2 = [5 5]; sigma2 = [0.5 0 ; 0 0.3]; mu3 = [-2, -2]; sigma3 = [1 0 ; 0 0.9]; N = 200; X = [mvnrnd(mu1,sigma1,N);... mvnrnd(mu2,sigma2,N);... mvnrnd(mu3,sigma3,N)];
Evaluate the optimal number of clusters using the Davies-Bouldin criterion. Cluster the data using
E = evalclusters(X,'kmeans','DaviesBouldin','klist',[1:6])
E = DaviesBouldinEvaluation with properties: NumObservations: 600 InspectedK: [1 2 3 4 5 6] CriterionValues: [NaN 0.4663 0.4454 0.8316 1.0444 0.9236] OptimalK: 3
OptimalK value indicates that, based on the Davies-Bouldin criterion, the optimal number of clusters is three.
Plot the Davies-Bouldin criterion values for each number of clusters tested.
The plot shows that the lowest Davies-Bouldin value occurs at three clusters, suggesting that the optimal number of clusters is three.
Create a grouped scatter plot to visually examine the suggested clusters.
The plot shows three distinct clusters within the data: Cluster 1 is in the lower-left corner, cluster 2 is near the center of the plot, and cluster 3 is in the upper-right corner.
The Davies-Bouldin criterion is based on a ratio of within-cluster and between-cluster distances. The Davies-Bouldin index is defined as
where Di,j is the within-to-between cluster distance ratio for the ith and jth clusters. In mathematical terms,
is the average distance between each point in the ith cluster and the centroid of the ith cluster. is the average distance between each point in the jth cluster and the centroid of the jth cluster. is the Euclidean distance between the centroids of the ith and jth clusters.
The maximum value of Di,j represents the worst-case within-to-between cluster ratio for cluster i. The optimal clustering solution has the smallest Davies-Bouldin index value.
 Davies, D. L., and D. W. Bouldin. "A Cluster Separation Measure." IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. PAMI-1, No. 2, 1979, pp. 224–227.