linkage - Create hierarchical cluster tree

Syntax

Z = linkage(y)
Z = linkage(y,method)
Z = linkage(X,method,metric)
Z = linkage(X,method,inputs)

Description

Z = linkage(y) creates a hierarchical cluster tree from the distances in y. y is a Euclidean distance matrix or a more general dissimilarity matrix, formatted as a vector, as returned by pdist.

Z is a (m–1)-by-3 matrix, where m is the number of observations in the original data. Columns 1 and 2 of Z contain cluster indices linked in pairs to form a binary tree. The leaf nodes are numbered from 1 to m. Leaf nodes are the singleton clusters from which all higher clusters are built. Each newly-formed cluster, corresponding to row Z(I,:), is assigned the index m+I. Z(I,1:2) contains the indices of the two component clusters that form cluster m+I. There are m-1 higher clusters which correspond to the interior nodes of the clustering tree. Z(I,3) contains the linkage distances between the two clusters merged in row Z(I,:).

For example, suppose there are 30 initial nodes and at step 12 cluster 5 and cluster 7 are combined. Suppose their distance at that time is 1.5. Then Z(12,:) will be [5, 7, 1.5]. The newly formed cluster will have index 12 + 30 = 42. If cluster 42 appears in a later row, it means the cluster created at step 12 is being combined into some larger cluster.

Z = linkage(y,method) creates the tree using the specified method. Methods differ from one another in how they measure the distance between clusters. Available methods are listed in the following table.

MethodDescription
'average'

Unweighted average distance (UPGMA).

'centroid'

Centroid distance (UPGMC). Y must contain Euclidean distances.

'complete'

Furthest distance.

'median'

Weighted center of mass distance (WPGMC). Y must contain Euclidean distances.

'single'

Shortest distance. This is the default.

'ward'

Inner squared distance (minimum variance algorithm). Y must contain Euclidean distances.

'weighted'

Weighted average distance (WPGMA).

Z = linkage(X,method,metric) creates a hierarchical cluster tree from the observations in X. Rows in X correspond to observations and columns to variables. Pairwise distances are computed internally by calling pdist. metric is one of the distance metrics accepted by pdist.

Z = linkage(X,method,inputs) allows you to pass extra input arguments to pdist. inputs is a cell array containing input arguments.

Linkages

The following notation is used to describe the linkages used by the various methods:

Example

X = [3 1.7; 1 1; 2 3; 2 2.5; 1.2 1; 1.1 1.5; 3 1];
Y = pdist(X);
Z = linkage(Y)
Z =
    2.0000  5.0000  0.2000
    3.0000  4.0000  0.5000
    8.0000  6.0000  0.5099
    1.0000  7.0000  0.7000
   11.0000  9.0000  1.2806
   12.0000 10.0000  1.3454

See Also

cluster, clusterdata, cophenet, dendrogram, inconsistent, kmeans, pdist, silhouette, squareform

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS