Documentation |
Principal component analysis (PCA) on data
[COEFF,SCORE] = princomp(X)
[COEFF,SCORE,latent] = princomp(X)
[COEFF,SCORE,latent,tsquare] = princomp(X)
[...] = princomp(X,'econ')
COEFF = princomp(X) performs principal components analysis (PCA) on the n-by-p data matrix X, and returns the principal component coefficients, also known as loadings. Rows of X correspond to observations, columns to variables. COEFF is a p-by-p matrix, each column containing coefficients for one principal component. The columns are in order of decreasing component variance.
princomp centers X by subtracting off column means, but does not rescale the columns of X. To perform principal components analysis with standardized variables, that is, based on correlations, use princomp(zscore(X)). To perform principal components analysis directly on a covariance or correlation matrix, use pcacov.
[COEFF,SCORE] = princomp(X) returns SCORE, the principal component scores; that is, the representation of X in the principal component space. Rows of SCORE correspond to observations, columns to components.
[COEFF,SCORE,latent] = princomp(X) returns latent, a vector containing the eigenvalues of the covariance matrix of X.
[COEFF,SCORE,latent,tsquare] = princomp(X) returns tsquare, which contains Hotelling's T^{2} statistic for each data point.
The scores are the data formed by transforming the original data into the space of the principal components. The values of the vector latent are the variance of the columns of SCORE. Hotelling's T^{2} is a measure of the multivariate distance of each observation from the center of the data set.
When n <= p, SCORE(:,n:p) and latent(n:p) are necessarily zero, and the columns of COEFF(:,n:p) define directions that are orthogonal to X.
[...] = princomp(X,'econ') returns only the elements of latent that are not necessarily zero, and the corresponding columns of COEFF and SCORE, that is, when n <= p, only the first n-1. This can be significantly faster when p is much larger than n.
Compute principal components for the ingredients data in the Hald data set, and the variance accounted for by each component.
load hald; [pc,score,latent,tsquare] = princomp(ingredients); pc,latent pc = -0.0678 -0.6460 0.5673 0.5062 -0.6785 -0.0200 -0.5440 0.4933 0.0290 0.7553 0.4036 0.5156 0.7309 -0.1085 -0.4684 0.4844 latent = 517.7969 67.4964 12.4054 0.2372
The following command and plot show that two components account for 98% of the variance:
cumsum(latent)./sum(latent) ans = 0.86597 0.97886 0.9996 1 biplot(pc(:,1:2),'Scores',score(:,1:2),'VarLabels',... {'X1' 'X2' 'X3' 'X4'})
For a more detailed example and explanation of this analysis method, see Principal Component Analysis (PCA).
[1] Jackson, J. E., A User's Guide to Principal Components, John Wiley and Sons, 1991, p. 592.
[2] Jolliffe, I. T., Principal Component Analysis, 2nd edition, Springer, 2002.
[3] Krzanowski, W. J. Principles of Multivariate Analysis: A User's Perspective. New York: Oxford University Press, 1988.
[4] Seber, G. A. F., Multivariate Observations, Wiley, 1984.