Note: This page has been translated by MathWorks. Click here to see

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Perform nonclassical multidimensional scaling using `mdscale`

.

The function `mdscale`

performs nonclassical
multidimensional scaling. As with `cmdscale`

, you use
`mdscale`

either to visualize dissimilarity data for which no
“locations” exist, or to visualize high-dimensional data by reducing
its dimensionality. Both functions take a matrix of dissimilarities as an input and
produce a configuration of points. However, `mdscale`

offers a
choice of different criteria to construct the configuration, and allows missing data
and weights.

For example, the cereal data include measurements on 10 variables describing
breakfast cereals. You can use `mdscale`

to visualize these data in
two dimensions. First, load the data. For clarity, this example code selects a
subset of 22 of the observations.

load cereal.mat X = [Calories Protein Fat Sodium Fiber ... Carbo Sugars Shelf Potass Vitamins]; % Take a subset from a single manufacturer mfg1 = strcmp('G',cellstr(Mfg)); X = X(mfg1,:); size(X) ans = 22 10

Then use `pdist`

to transform the 10-dimensional
data into dissimilarities. The output from `pdist`

is a symmetric
dissimilarity matrix, stored as a vector containing only the (23*22/2) elements in
its upper triangle.

dissimilarities = pdist(zscore(X),'cityblock'); size(dissimilarities) ans = 1 231

This example code first standardizes the cereal data, and then uses city block distance as a dissimilarity. The choice of transformation to dissimilarities is application-dependent, and the choice here is only for simplicity. In some applications, the original data are already in the form of dissimilarities.

Next, use `mdscale`

to perform metric MDS. Unlike
`cmdscale`

, you must specify the desired number of dimensions,
and the method to use to construct the output configuration. For this example, use
two dimensions. The metric STRESS criterion is a common method for computing the
output; for other choices, see the `mdscale`

reference page in the online documentation. The second
output from `mdscale`

is the value of that criterion evaluated for
the output configuration. It measures the how well the inter-point distances of the
output configuration approximate the original input dissimilarities:

[Y,stress] =... mdscale(dissimilarities,2,'criterion','metricstress'); stress stress = 0.1856

A scatterplot of the output from `mdscale`

represents the
original 10-dimensional data in two dimensions, and you can use the `gname`

function to label selected
points:

plot(Y(:,1),Y(:,2),'o','LineWidth',2); gname(Name(mfg1))

Metric multidimensional scaling creates a configuration of points whose
inter-point distances approximate the given dissimilarities. This is sometimes too
strict a requirement, and non-metric scaling is designed to relax it a bit. Instead
of trying to approximate the dissimilarities themselves, non-metric scaling
approximates a nonlinear, but monotonic, transformation of them. Because of the
monotonicity, larger or smaller distances on a plot of the output will correspond to
larger or smaller dissimilarities, respectively. However, the nonlinearity implies
that `mdscale`

only attempts to preserve the ordering of
dissimilarities. Thus, there may be contractions or expansions of distances at
different scales.

You use `mdscale`

to perform nonmetric MDS in much the same way
as for metric scaling. The nonmetric STRESS criterion is a common method for
computing the output; for more choices, see the `mdscale`

reference
page in the online documentation. As with metric scaling, the second output from
`mdscale`

is the value of that criterion evaluated for the
output configuration. For nonmetric scaling, however, it measures the how well the
inter-point distances of the output configuration approximate the disparities. The
disparities are returned in the third output. They are the transformed values of the
original dissimilarities:

[Y,stress,disparities] = ... mdscale(dissimilarities,2,'criterion','stress'); stress stress = 0.1562

To check the fit of the output configuration to the dissimilarities, and to understand the disparities, it helps to make a Shepard plot:

distances = pdist(Y); [dum,ord] = sortrows([disparities(:) dissimilarities(:)]); plot(dissimilarities,distances,'bo', ... dissimilarities(ord),disparities(ord),'r.-', ... [0 25],[0 25],'k-') xlabel('Dissimilarities') ylabel('Distances/Disparities') legend({'Distances' 'Disparities' '1:1 Line'},... 'Location','NorthWest');

This plot shows that `mdscale`

has found a configuration of
points in two dimensions whose inter-point distances approximates the disparities,
which in turn are a nonlinear transformation of the original dissimilarities. The
concave shape of the disparities as a function of the dissimilarities indicates that
fit tends to contract small distances relative to the corresponding dissimilarities.
This may be perfectly acceptable in practice.

`mdscale`

uses an iterative algorithm to find the output
configuration, and the results can often depend on the starting point. By default,
`mdscale`

uses `cmdscale`

to construct an
initial configuration, and this choice often leads to a globally best solution.
However, it is possible for `mdscale`

to stop at a configuration
that is a local minimum of the criterion. Such cases can be diagnosed and often
overcome by running `mdscale`

multiple times with different
starting points. You can do this using the `'start'`

and
`'replicates'`

name-value pair arguments. The following code
runs five replicates of MDS, each starting at a different randomly-chosen initial
configuration. The criterion value is printed out for each replication;
`mdscale`

returns the configuration with the best fit.

opts = statset('Display','final'); [Y,stress] =... mdscale(dissimilarities,2,'criterion','stress',... 'start','random','replicates',5,'Options',opts); 35 iterations, Final stress criterion = 0.156209 31 iterations, Final stress criterion = 0.156209 48 iterations, Final stress criterion = 0.171209 33 iterations, Final stress criterion = 0.175341 32 iterations, Final stress criterion = 0.185881

Notice that `mdscale`

finds several different local solutions,
some of which do not have as low a stress value as the solution found with the
`cmdscale`

starting point.