Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Principal component analysis (PCA) on data

`princomp`

will be removed in a future release.
Use `pca`

instead.

`[COEFF,SCORE] = princomp(X)`

[COEFF,SCORE,latent] = princomp(X)

[COEFF,SCORE,latent,tsquare] = princomp(X)

[...] = princomp(X,'econ')

`COEFF = princomp(X)`

performs principal components
analysis (PCA) on the *n*-by-*p* data
matrix `X`

, and returns the principal component coefficients,
also known as loadings. Rows of `X`

correspond to
observations, columns to variables. `COEFF`

is a *p*-by-*p* matrix,
each column containing coefficients for one principal component. The
columns are in order of decreasing component variance.

`princomp`

centers `X`

by
subtracting off column means, but does not rescale the columns of `X`

.
To perform principal components analysis with standardized variables,
that is, based on correlations, use `princomp(zscore(X))`

.
To perform principal components analysis directly on a covariance
or correlation matrix, use `pcacov`

.

`[COEFF,SCORE] = princomp(X)`

returns `SCORE`

,
the principal component scores; that is, the representation of `X`

in
the principal component space. Rows of `SCORE`

correspond
to observations, columns to components.

`[COEFF,SCORE,latent] = princomp(X)`

returns `latent`

,
a vector containing the eigenvalues of the covariance matrix of `X`

.

`[COEFF,SCORE,latent,tsquare] = princomp(X)`

returns `tsquare`

,
which contains Hotelling's T^{2} statistic
for each data point.

The scores are the data formed by transforming the original
data into the space of the principal components. The values of the
vector `latent`

are the variance of the columns of `SCORE`

.
Hotelling's T^{2} is a measure of the multivariate
distance of each observation from the center of the data set.

When `n <= p`

, `SCORE(:,n:p)`

and `latent(n:p)`

are
necessarily zero, and the columns of `COEFF(:,n:p)`

define
directions that are orthogonal to `X`

.

`[...] = princomp(X,'econ')`

returns only
the elements of `latent`

that are not necessarily
zero, and the corresponding columns of `COEFF`

and `SCORE`

,
that is, when `n <= p`

, only the first `n-1`

.
This can be significantly faster when `p`

is much
larger than `n`

.

Compute principal components for the `ingredients`

data
in the Hald data set, and the variance accounted for by each component.

load hald; [pc,score,latent,tsquare] = princomp(ingredients); pc,latent pc = -0.0678 -0.6460 0.5673 0.5062 -0.6785 -0.0200 -0.5440 0.4933 0.0290 0.7553 0.4036 0.5156 0.7309 -0.1085 -0.4684 0.4844 latent = 517.7969 67.4964 12.4054 0.2372

The following command and plot show that two components account for 98% of the variance:

cumsum(latent)./sum(latent) ans = 0.86597 0.97886 0.9996 1 biplot(pc(:,1:2),'Scores',score(:,1:2),'VarLabels',... {'X1' 'X2' 'X3' 'X4'})

For a more detailed example and explanation of this analysis method, see Principal Component Analysis (PCA).

[1] Jackson, J. E., *A User's Guide
to Principal Components*, John Wiley and Sons, 1991, p.
592.

[2] Jolliffe, I. T., *Principal
Component Analysis*, 2nd edition, Springer, 2002.

[3] Krzanowski, W. J. *Principles
of Multivariate Analysis: A User's Perspective*. New York:
Oxford University Press, 1988.

[4] Seber, G. A. F., *Multivariate
Observations*, Wiley, 1984.

Was this topic helpful?