Documentation |
Probabilistic principal component analysis
[coeff,score,pcvar] = ppca(Y,K) returns the principal component coefficients for the n-by-p data matrix Y based on a probabilistic principal component analysis (PPCA). It also returns the principal component scores, which are the representations of Y in the principal component space, and the principal component variances, which are the eigenvalues of the covariance matrix of Y, in pcvar.
Each column of coeff contains coefficients for one principal component, and the columns are in descending order of component variance. Rows of score correspond to observations, and columns correspond to components. Rows of Y correspond to observations and columns correspond to variables.
Probabilistic principal component analysis might be preferable to other algorithms that handle missing data, such as the alternating least squares algorithm when any data vector has one or more missing values. It assumes that the values are missing at random through the data set. An expectation-maximization algorithm is used for both complete and missing data.
[coeff,score,pcvar] = ppca(Y,K,Name,Value) returns the principal component coefficients, scores, and variances using additional options for computation and handling of special data types, specified by one or more Name,Value pair arguments.
For example, you can introduce initial values for the residual variance, v, or change the termination criteria.
[1] Tipping, M. E., and C. M. Bishop. Probabilistic Principal Component Analysis. Journal of the Royal Statistical Society. Series B (Statistical Methodology), Vol. 61, No.3, 1999, pp. 611–622.
[2] Roweis, S. "EM Algorithms for PCA and SPCA." In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems. Vol.10 (NIPS 1997), Cambridge, MA, USA: MIT Press, 1998, pp. 626–632.
[3] Ilin, A., and T. Raiko. "Practical Approaches to Principal Component Analysis in the Presence of Missing Values." J. Mach. Learn. Res.. Vol. 11, August, 2010, pp. 1957–2000.