# DaviesBouldinEvaluation

Davies-Bouldin criterion clustering evaluation object

## Description

`DaviesBouldinEvaluation` is an object consisting of sample data (`X`), clustering data (`OptimalY`), and Davies-Bouldin criterion values (`CriterionValues`) used to evaluate the optimal number of clusters (`OptimalK`). The Davies-Bouldin criterion is based on a ratio of within-cluster and between-cluster distances. The optimal clustering solution has the smallest Davies-Bouldin index value. For more information, see Davies-Bouldin Criterion.

## Creation

Create a Davies-Bouldin criterion clustering evaluation object by using the `evalclusters` function and specifying the criterion as `"DaviesBouldin"`.

You can then use `compact` to create a compact version of the Davies-Bouldin criterion clustering evaluation object. The function removes the contents of the properties `X`, `OptimalY`, and `Missing`.

## Properties

expand all

### Clustering Evaluation Properties

Clustering algorithm used to cluster the sample data, returned as `'kmeans'`, `'linkage'`, `'gmdistribution'`, or a function handle. If you specify the clustering solutions as an input argument to `evalclusters` when you create the clustering evaluation object, then `ClusteringFunction` is empty.

ValueDescription
`'kmeans'`Cluster the data in `X` using the `kmeans` clustering algorithm, with `EmptyAction` set to `"singleton"` and `Replicates` set to `5`.
`'linkage'`Cluster the data in `X` using the `clusterdata` agglomerative clustering algorithm, with `Linkage` set to `"ward"`.
`'gmdistribution'`Cluster the data in `X` using the `gmdistribution` Gaussian mixture distribution algorithm, with `SharedCov` set to `true` and `Replicates` set to `5`.

Data Types: `double` | `char` | `function_handle`

Name of the criterion used for clustering evaluation, returned as `'DaviesBouldin'`.

Criterion values, returned as a numeric vector. Each value corresponds to a proposed number of clusters in `InspectedK`.

Data Types: `double`

List of the number of proposed clusters for which to compute criterion values, returned as a positive integer vector.

Data Types: `double`

Optimal number of clusters, returned as a positive integer scalar.

Data Types: `double`

Optimal clustering solution corresponding to `OptimalK`, returned as a positive integer column vector. Each row of `OptimalY` represents the cluster index of the corresponding observation (or row) in `X`. If you specify the clustering solutions as an input argument to `evalclusters` when you create the clustering evaluation object, or if the clustering evaluation object is compact (see `compact`), then `OptimalY` is empty.

Data Types: `double`

### Sample Data Properties

Excluded data, returned as a logical column vector. If an element of `Missing` is `true`, then the corresponding observation (or row) in the data matrix `X` is not used in the clustering solutions. If the clustering evaluation object is compact (see `compact`), then `Missing` is empty.

Data Types: `double` | `logical`

Number of observations in the data matrix `X`, ignoring observations with missing (`NaN`) values, returned as a positive integer scalar.

Data Types: `double`

Data used for clustering, returned as a numeric matrix. Rows correspond to observations, and columns correspond to variables. If the clustering evaluation object is compact (see `compact`), then `X` is empty.

Data Types: `single` | `double`

## Object Functions

 `addK` Evaluate additional numbers of clusters `compact` Compact clustering evaluation object `plot` Plot clustering evaluation object criterion values

## Examples

collapse all

Evaluate the optimal number of clusters using the Davies-Bouldin clustering evaluation criterion.

Generate sample data containing random numbers from three multivariate distributions with different parameter values.

```rng("default") % For reproducibility n = 200; mu1 = [2 2]; sigma1 = [0.9 -0.0255; -0.0255 0.9]; mu2 = [5 5]; sigma2 = [0.5 0; 0 0.3]; mu3 = [-2 -2]; sigma3 = [1 0; 0 0.9]; X = [mvnrnd(mu1,sigma1,n); ... mvnrnd(mu2,sigma2,n); ... mvnrnd(mu3,sigma3,n)];```

Evaluate the optimal number of clusters using the Davies-Bouldin criterion. Cluster the data using `kmeans`.

`evaluation = evalclusters(X,"kmeans","DaviesBouldin","KList",1:6)`
```evaluation = DaviesBouldinEvaluation with properties: NumObservations: 600 InspectedK: [1 2 3 4 5 6] CriterionValues: [NaN 0.4663 0.4454 0.8316 1.0444 0.9236] OptimalK: 3 ```

The `OptimalK` value indicates that, based on the Davies-Bouldin criterion, the optimal number of clusters is three.

Plot the Davies-Bouldin criterion values for each number of clusters tested.

`plot(evaluation)`

The plot shows that the lowest Davies-Bouldin value occurs at three clusters, suggesting that the optimal number of clusters is three.

Create a grouped scatter plot to visually examine the suggested clusters.

```clusters = evaluation.OptimalY; gscatter(X(:,1),X(:,2),clusters,[],"xod")```

The plot shows three distinct clusters within the data: cluster 1 in the lower-left corner, cluster 2 in the upper-right corner, and cluster 3 near the center of the plot.