## Documentation Center |

**Package: **clustering.evaluation**Superclasses: **clustering.evaluation.ClusterCriterion

Calinski-Harabasz criterion clustering evaluation object

`clustering.evaluation.CalinskiHarabaszEvaluation` is
an object consisting of sample data, clustering data, and Calinski-Harabasz
criterion values used to evaluate the optimal number of clusters.
Create a Calinski-Harabasz criterion clustering evaluation object
using `evalclusters`.

` eva = evalclusters(x,clust,'CalinskiHarabasz')` creates
a Calinski-Harabasz criterion clustering evaluation object.

` eva = evalclusters(x,clust,'CalinskiHarabasz',Name,Value)` creates
a Calinski-Harabasz criterion clustering evaluation object using additional
options specified by one or more name-value pair arguments.

addK | Evaluate additional numbers of clusters |

compact | Compact clustering evaluation object |

plot | Plot clustering evaluation object criterion values |

The Calinski-Harabasz criterion is sometimes called the variance ratio criterion (VRC). The Calinski-Harabasz index is defined as

, where *SS*_{B} is the
overall between-cluster variance, *SS*_{W} is
the overall within-cluster variance, *k* is the number
of clusters, and *N* is the number of observations.

The overall between-cluster variance *SS*_{B} is
defined as

where *k* is the number of clusters, *m*_{i} is
the centroid of cluster *i*, *m* is
the overall mean of the sample data, and
is the *L ^{2}* norm
(Euclidean distance) between the two vectors.

The overall within-cluster variance *SS*_{W} is
defined as

where *k* is the number of clusters, *x* is
a data point, *c*_{i} is
the *i*th cluster, *m*_{i} is
the centroid of cluster *i*, and
is the *L ^{2}* norm
(Euclidean distance) between the two vectors.

Well-defined clusters have a large between-cluster variance
(*SS*_{B}) and a small within-cluster
variance (*SS*_{W}). The larger
the VRC_{k} ratio, the better the data partition.
To determine the optimal number of clusters, maximize VRC_{k} with
respect to *k*. The optimal number of clusters is
the solution with the highest Calinski-Harabasz index value.

The Calinski-Harabasz criterion is best suited for *k*-means
clustering solutions with squared Euclidean distances.

[1] Calinski, T., and J. Harabasz. "A dendrite method
for cluster analysis." *Communications in Statistics*.
Vol. 3, No. 1, 1974, pp. 1–27.

`clustering.evaluation.DaviesBouldinEvaluation` | `clustering.evaluation.GapEvaluation` | `clustering.evaluation.SilhouetteEvaluation` | `evalclusters`

Was this topic helpful?