File Exchange

image thumbnail

Parallel Analysis (PA) to for determining the number of components to retain from PCA.

version 1.0.0.0 (1.96 KB) by Hanan Shteingart
component is retained if the associated eigenvalue is bigger than the 95th of the distribution of ei

4 Downloads

Updated 10 Jan 2014

View License

% Parallel Analysis (PA) to for determining the number of components to retain from PCA. component is retained if the associated eigenvalue is bigger than the 95th of the distribution of eigenvalues derived from the random data.
% Syntax:
% ======
% pa_test(x, nShuffle, alpha, princomp_parameters[ ])
% x - the data matrix (nXp where n is the number of observation and p is dimension of each observation)
% nShuffle - number of shuffles. optional, default = 100
% alpha - significance level. optional, default 0.05
% princomp_parameters - parameters to pass to the princomp function (see help princomp). optional, default ={true,'Centered',false}

% Background:
% ==========
% From Wikipedia: http://en.wikipedia.org/wiki/Factor_analysis
% Horn's Parallel Analysis (PA):
% A Monte-Carlo based simulation method that compares the observed eigenvalues with those obtained from uncorrelated normal variables.
% A factor or component is retained if the associated eigenvalue is bigger than the 95th of the distribution of eigenvalues derived from the random data.
% PA is one of the most recommendable rules for determining the number of components to retain, but only few programs include this option.

% References:
% * Ledesma, R.D.; Valero-Mora, P. (2007). "Determining the Number of Factors to Retain in EFA: An easy-to-use computer program for carrying out Parallel Analysis". Practical Assessment Research & Evaluation 12 (2): 1–11.

Comments and Ratings (3)

% Parallel Analysis (PA) to for determining the number of components to retain from PCA. the component is retained in the associated eigenvalue is bigger than the 95th of the distribution of eigenvalues derived from the random data.
% Syntax:

Hi - this is a great script! I made some modifications that I hope you will consider making here as well:

I Used "PCA" instead of "princomp" since this will be removed in future releases.

I changed a bit of the code to (1) allow for parallel processing to speed up (maybe have this as an option) and (2) altered how it shuffles the data which removes the for loop (reduces the shuffling time by approximately 40%). You can see the change below.

h = parpool(4);
parfor iShuffle = 1:nShuffle
[~,p] = sort(rand(size(x,2),size(x,1)),2);
ind = sub2ind(size(x),p',ones(size(x,1),1)*[1:size(x,2)]);
xShuffle = x(ind);
[~, ~ ,latentShuffle(:,iShuffle)] = pca(xShuffle);
end
close(h);

Thanks again!

Rodolphe

Thanks for the script. It unfortunately produces an error when there is missing data (presence of NaNs).
Is there a way to fix that problem?

MATLAB Release Compatibility
Created with R2013b
Compatible with any release
Platform Compatibility
Windows macOS Linux
Tags Add Tags
pca

Discover Live Editor

Create scripts with code, output, and formatted text in a single executable document.


Learn About Live Editor