Main Content

Construct agglomerative clusters from data

returns cluster indices for each observation (row) of an input data matrix
`T`

= clusterdata(`X`

,`cutoff`

)`X`

, given a threshold `cutoff`

for cutting an
agglomerative hierarchical tree that the `linkage`

function generates from `X`

.

`clusterdata`

supports agglomerative clustering and incorporates
the `pdist`

, `linkage`

, and
`cluster`

functions, which you can use
separately for more detailed analysis. See Algorithm Description for more details.

If

`'Linkage'`

is`'centroid'`

or`'median'`

, then`linkage`

can produce a cluster tree that is not monotonic. This result occurs when the distance from the union of two clusters,*r*and*s*, to a third cluster is less than the distance between*r*and*s*. In this case, in a dendrogram drawn with the default orientation, the path from a leaf to the root node takes some downward steps. To avoid this result, specify another value for`'Linkage'`

. The following image shows a nonmonotonic cluster tree.In this case, cluster 1 and cluster 3 are joined into a new cluster, while the distance between this new cluster and cluster 2 is less than the distance between cluster 1 and cluster 3.

If you specify a value `c`

for the `cutoff`

input
argument, then

performs the
following steps:`T`

=
`clusterdata`

(`X`

,c)

Create a vector of the Euclidean distance between pairs of observations in

`X`

by using`pdist`

.`Y =`

`pdist`

(`X`

,'euclidean')Create an agglomerative hierarchical cluster tree from

`Y`

by using`linkage`

with the`'single'`

method for computing the shortest distance between clusters.`Z =`

`linkage`

(Y,'single')If

`0 <`

`c`

`< 2`

, use`cluster`

to define clusters from`Z`

when inconsistent values are less than`c`

.`T`

=`cluster`

(Z,'Cutoff',c)If

`c`

is an integer value`≥ 2`

, use`cluster`

to find a maximum of`c`

clusters from`Z`

.`T`

= cluster(Z,'MaxClust',c)

If you have a hierarchical cluster tree `Z`

(the output of the `linkage`

function for the input data matrix `X`

), you can use
`cluster`

to perform agglomerative clustering on `Z`

and return
the cluster assignment for each observation (row) in `X`

.