| Statistics Toolbox™ | ![]() |
Z = linkage(y)
Z = linkage(y,method)
Z = linkage(X,method,metric)
Z = linkage(X,method,inputs)
Z = linkage(y) creates a hierarchical cluster tree from the distances in y. y is a Euclidean distance matrix or a more general dissimilarity matrix, formatted as a vector, as returned by pdist.
Z is a (m–1)-by-3 matrix, where m is the number of observations in the original data. Columns 1 and 2 of Z contain cluster indices linked in pairs to form a binary tree. The leaf nodes are numbered from 1 to m. Leaf nodes are the singleton clusters from which all higher clusters are built. Each newly-formed cluster, corresponding to row Z(I,:), is assigned the index m+I. Z(I,1:2) contains the indices of the two component clusters that form cluster m+I. There are m-1 higher clusters which correspond to the interior nodes of the clustering tree. Z(I,3) contains the linkage distances between the two clusters merged in row Z(I,:).
For example, suppose there are 30 initial nodes and at step 12 cluster 5 and cluster 7 are combined. Suppose their distance at that time is 1.5. Then Z(12,:) will be [5, 7, 1.5]. The newly formed cluster will have index 12 + 30 = 42. If cluster 42 appears in a later row, it means the cluster created at step 12 is being combined into some larger cluster.
Z = linkage(y,method) creates the tree using the specified method. Methods differ from one another in how they measure the distance between clusters. Available methods are listed in the following table.
| Method | Description |
|---|---|
| 'average' | Unweighted average distance (UPGMA). |
| 'centroid' | Centroid distance (UPGMC). Y must contain Euclidean distances. |
| 'complete' | Furthest distance. |
| 'median' | Weighted center of mass distance (WPGMC). Y must contain Euclidean distances. |
| 'single' | Shortest distance. This is the default. |
| 'ward' | Inner squared distance (minimum variance algorithm). Y must contain Euclidean distances. |
| 'weighted' | Weighted average distance (WPGMA). |
Note The 'centroid' and 'median' methods can produce a cluster tree that is not monotonic. This occurs when the distance from the union of two clusters, r and s, to a third cluster is less than the distance from either r or s to that third cluster. In this case, sections of the dendrogram change direction. This is an indication that you should use another method. |
Z = linkage(X,method,metric) creates a hierarchical cluster tree from the observations in X. Rows in X correspond to observations and columns to variables. Pairwise distances are computed internally by calling pdist. metric is one of the distance metrics accepted by pdist.
Z = linkage(X,method,inputs) allows you to pass extra input arguments to pdist. inputs is a cell array containing input arguments.
The following notation is used to describe the linkages used by the various methods:
Cluster r is formed from clusters p and q.
nr is the number of objects in cluster r.
xri is the ith object in cluster r.
Single linkage, also called nearest neighbor, uses the smallest distance between objects in the two clusters:
![]()
Complete linkage, also called furthest neighbor, uses the largest distance between objects in the two clusters:
![]()
Average linkage uses the average distance between all pairs of objects in any two clusters:
![]()
Centroid linkage uses the Euclidean distance between the centroids of the two clusters:
![]()
where
![]()
Median linkage uses the Euclidean distance between weighted centroids of the two clusters,
![]()
where
and
are weighted centroids for the
clusters r and s. If cluster r was created by combining clusters p and q,
is defined recursively as
![]()
Ward's linkage uses the incremental sum of squares; that is, the increase in the total within-cluster sum of squares as a result of joining two clusters. The within-cluster sum of squares is defined as the sum of the squares of the distances between all objects in the cluster and the centroid of the cluster. The equivalent distance is:
![]()
where
is Euclidean distance, and
and
are the centroids
of clusters r and s, as
defined in the centroid linkage.
X = [3 1.7; 1 1; 2 3; 2 2.5; 1.2 1; 1.1 1.5; 3 1];
Y = pdist(X);
Z = linkage(Y)
Z =
2.0000 5.0000 0.2000
3.0000 4.0000 0.5000
8.0000 6.0000 0.5099
1.0000 7.0000 0.7000
11.0000 9.0000 1.2806
12.0000 10.0000 1.3454cluster, clusterdata, cophenet, dendrogram, inconsistent, kmeans, pdist, silhouette, squareform
![]() | linhyptest | logncdf | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |