Documentation |
Classical multidimensional scaling
Y = cmdscale(D)
[Y,e] = cmdscale(D)
Y = cmdscale(D) takes an n-by-n distance matrix D, and returns an n-by-p configuration matrix Y. Rows of Y are the coordinates of n points in p-dimensional space for some p < n. When D is a Euclidean distance matrix, the distances between those points are given by D. p is the dimension of the smallest space in which the n points whose inter-point distances are given by D can be embedded.
[Y,e] = cmdscale(D) also returns the eigenvalues of Y*Y'. When D is Euclidean, the first p elements of e are positive, the rest zero. If the first k elements of e are much larger than the remaining (n-k), then you can use the first k columns of Y as k-dimensional points whose inter-point distances approximate D. This can provide a useful dimension reduction for visualization, e.g., for k = 2.
D need not be a Euclidean distance matrix. If it is non-Euclidean or a more general dissimilarity matrix, then some elements of e are negative, and cmdscale chooses p as the number of positive eigenvalues. In this case, the reduction to p or fewer dimensions provides a reasonable approximation to D only if the negative elements of e are small in magnitude.
You can specify D as either a full dissimilarity matrix, or in upper triangle vector form such as is output by pdist. A full dissimilarity matrix must be real and symmetric, and have zeros along the diagonal and positive elements everywhere else. A dissimilarity matrix in upper triangle form must have real, positive entries. You can also specify D as a full similarity matrix, with ones along the diagonal and all other elements less than one. cmdscale transforms a similarity matrix to a dissimilarity matrix in such a way that distances between the points returned in Y equal or approximate sqrt(1-D). To use a different transformation, you must transform the similarities prior to calling cmdscale.
Generate some points in 4-D space, but close to 3-D space, then reduce them to distances only.
X = [normrnd(0,1,10,3) normrnd(0,.1,10,1)]; D = pdist(X,'euclidean');
Find a configuration with those inter-point distances.
[Y,e] = cmdscale(D); % Four, but fourth one small dim = sum(e > eps^(3/4)) % Poor reconstruction maxerr2 = max(abs(pdist(X)-pdist(Y(:,1:2)))) % Good reconstruction maxerr3 = max(abs(pdist(X)-pdist(Y(:,1:3)))) % Exact reconstruction maxerr4 = max(abs(pdist(X)-pdist(Y))) % D is now non-Euclidean D = pdist(X,'cityblock'); [Y,e] = cmdscale(D); % One is large negative min(e) % Poor reconstruction maxerr = max(abs(pdist(X)-pdist(Y)))
[1] Seber, G. A. F. Multivariate Observations. Hoboken, NJ: John Wiley & Sons, Inc., 1984.
mdscale | pdist | procrustes