File Exchange

image thumbnail

EMPCA

version 1.6.0.0 (6.17 KB) by Vicente Parot
Expectation-Maximization Principal Component Analysis

2 Downloads

Updated 26 Jul 2017

View Version History

View License

EMPCA calculates principal components using an expectation maximization algorithm to find each component in the residual matrix after substracting the previously converged principal components.
EMPCA_W accepts a weight matrix to use in the weighted EM algorithm.
EMPCA_NAN accepts a data matrix with nans to use in the missing data EM algorithm.
An informative message reports the number of EM iterations computed for each component, revealing if the convergence was achieved under a certain tolerance, or if the iterations were stopped after a maximum number.
This implementation is especially useful to handle large matrices, and runs fast on gpuArray matrices.
The algorithm is described in
Bailey, Stephen. "Principal Component Analysis with Noisy and/or Missing Data." Publications of the Astronomical Society of the Pacific 124.919 (2012): 1015-1023.
http://arxiv.org/pdf/1208.4122v2.pdf

Cite As

Vicente Parot (2020). EMPCA (https://www.mathworks.com/matlabcentral/fileexchange/45353-empca), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (7)

Archana Nawandhar

Hello!
I am trying to compare the speedup achieved using GPU vs. CPU. When I am trying to use timeit function of matlab to compute time, it gives error. And as per the matlab documentation, tic, toc can not be used for gpuArray. Please guide me how to compare the speed up? Thanks.

Majid Ali Khan

I have a problem to run could i request for the full product?
please help me..

better li

Thank Vicente Parot very much! Very useful code, I used the function of empca_nan.m and compared with normal pca.m from MATLAB:
(1) the traditional PCA is: Data=Mean(Data,1)+score*Coeff', [Coeff,score,latent,tsquared,explained,mu] = pca(Data);
(2) the empca_nan.m is: Data=score1*s*Coeff'+A, [score1, s, coeff, a] = empca_nan(a, ncomps, emtol, maxiters)
the score in pca.m from MATLAB is :score=score1*s;
the speed is largely faster than pca(Data,'algorithm','als');

better li

Vicente Parot

Dear Lowell,
The updated submission adds implementation of the weighted and missing data versions of the algorithm described in the reference.
This allows to compute PCA with missing values.

lowell smoger

Let me be more specific. Each observation in the data array (described in my initial comment) is comprised of several interrelated subsets of data (of varying lengths), which cumulatively account for the 20,000 variables.

The missing values describe an entire subset of variables in a given observation. That is, if there are 5 sets of variables in each observation, the missing subset accounts for some of the 20,000 variables.

lowell smoger

I have a data array that is about 50x20,000 (50 observations of 20,000 variables). Some of the variable values in one of the observations is missing and I would like to estimate these missing values. If it is possible to do that with this code, please advise. Thanks!

MATLAB Release Compatibility
Created with R2013b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!