## Procrustes Analysis

### Compare Landmark Data

The `procrustes`

function
analyzes the distribution of a set of shapes using Procrustes analysis.
This analysis method matches landmark data (geometric locations representing
significant features in a given shape) to calculate the best shape-preserving
Euclidian transformations. These transformations minimize the differences
in location between compared landmark data.

Procrustes analysis is also useful in conjunction with multidimensional
scaling. In Example: Multidimensional Scaling there is an observation
that the orientation of the reconstructed points is arbitrary. Two
different applications of multidimensional scaling could produce reconstructed
points that are very similar in principle, but that look different
because they have different orientations. The `procrustes`

function
transforms one set of points to make them more comparable to the other.

### Data Input

The `procrustes`

function
takes two matrices as input:

The target shape matrix *X* has dimension `n`

× `p`

,
where `n`

is the number of landmarks in the shape
and `p`

is the number of measurements per landmark.

The comparison shape matrix *Y* has
dimension `n`

× `q`

with `q`

≤ `p`

.
If there are fewer measurements per landmark for the comparison shape
than the target shape (`q`

< `p`

),
the function adds columns of zeros to *Y*, yielding
an `n`

× `p`

matrix.

The equation to obtain the transformed shape, *Z*,
is

where:

*b* is a scaling factor that stretches
(*b* > 1) or shrinks (*b* <
1) the points.

*T* is the orthogonal rotation and
reflection matrix.

*c* is a matrix with constant values
in each column, used to shift the points.

The `procrustes`

function chooses *b*, *T*,
and *c* to minimize the distance between the target
shape *X* and the transformed shape *Z* as
measured by the least squares criterion:

$$\sum _{i=1}^{n}{\displaystyle \sum _{j=1}^{p}{({X}_{ij}-{Z}_{ij})}^{2}}$$

### Preprocess Data for Accurate Results

Procrustes analysis is appropriate when all `p`

measurement
dimensions have similar scales. The analysis would be inaccurate,
for example, if the columns of *Z* had different
scales:

The first column is measured in milliliters ranging
from 2,000 to 6,000.

The second column is measured in degrees Celsius ranging
from 10 to 25.

The third column is measured in kilograms ranging
from 50 to 230.

In such cases, standardize your variables by:

Subtracting the sample mean from each variable.

Dividing each resultant variable by its sample standard
deviation.

Use the `zscore`

function
to perform this standardization.

## See Also

`procrustes`

## Related Examples