This main function LOBPCG is a version of the preconditioned conjugate gradient method (Algorithm 5.1) described in A. V. Knyazev, Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method, SIAM Journal on Scientific Computing 23 (2001), no. 2, pp. 517541. http://dx.doi.org/10.1137/S1064827500366124
A Cversion of this code is a part of the http://code.google.com/p/blopex/
package and is available, e.g., in SLEPc and HYPRE.
Tested in MATLAB 6.57.13 and Octave 3.2.33.4.2.
1.5.0.0  added a toolbox format 

1.5.0.0  added a conversion to a toolbox 

1.5.0.0  A minor update. Functions can now be called using also function handles. Updated comments and examples. 

1.4.0.0  Editorial changes to make the code Octavecompatible. 

1.1.0.0  License update to free software (BSD). Comments update. 

1.0.0.0  minor update to remove mlint messages 

The first public release for generalized Hermitian eigenproblems. 

The final release for nongeneralized eigenproblems. 

modifying description 
Create scripts with code, output, and formatted text in a single executable document.
Andrew Knyazev (view profile)
I have added a few "gather" commands to this reference MATLAB code of LOBPCG so it also now runs with distributed or codistributed matrices, copy the modified code from the attachment to
https://www.mathworks.com/matlabcentral/answers/284759eigsinmultinodecluster#comment_599333
and run
if true
A = distributed(diag(1:100));
[blockVectorX,lambda,failureFlag]=lobpcg(randn(100,1),A,1e5,50,2)
end
Andrew Knyazev (view profile)
LOBPCG is designed to be simple for distributed computing, and multiple implementations, including GPU, are already available for many years, e.g.,:
https://bitbucket.org/joseroman/blopex
https://github.com/trilinos/Trilinos/blob/master/packages/anasazi/src/AnasaziLOBPCG.hpp
http://slepc.upv.es/documentation/current/src/eps/impls/cg/lobpcg/lobpcg.c.html
https://github.com/NVIDIA/AMGX/blob/master/eigen_examples/LOBPCG
https://docs.abinit.org/variables/dev/#wfoptalg
http://octopuscode.org/wiki/Developers_Manual:LOBPCG
The provided reference MATLAB code can also be used, after technical modifications, for computing singular vectors or eigenvectors in the TALL ARRAY format  the new functionality not currently provided by MATLAB or any of the toolboxes. The modifications are necessary, because of some current TALL ARRAY limitations, e.g., tall(rand(10,2))\diag([1 2]) is not supported and must be substituted with tall(rand(10,2))*inv(diag([1 2])).
Please let me know if anyone cares about this functionally in MATLAB and wants me to modify the code accordingly to support the TALL ARRAY format for the eigenvectors in my LOBPCG code.
Andrew Knyazev (view profile)
LOBPCG can be easily adopted to compute partial SVD and PCA for a data matrix X without ever computing its covariance matrix X'*X, i.e. in matrixfree fashion. The main calculation in LOBPCG is evaluation of a function of the product X'*(X*v) of the covariance matrix X'*X and the blockvector v. PCA needs the largest eigenvalues of the covariance matrix, while LOBPCG is typically implemented to calculate the smallest ones. A simple workaround is to negate the function, substituting X'*(X*R) for X'*(X*R) and thus reversing the order of the eigenvalues, since LOBPCG does not care if the matrix of the eigenvalue problem is positive definite or not.
A possibly competing alternative to LOBPCG is to try EIGS, but EIGS is not used in PCA, e.g., since its code cannot be embedded or work directly with tall arrays, while LOBPCG can do both being a pure MATLAB code.
LOBPCG also supports sparse data matrix X, in contrast to PCA. The example below demonstrates that LOBPCG starts outperforming SVD, EIG, PCA, and partial PCA even for full (nonsparse) data matrix X, for matrix sizes above 5,000, when only 1 principle component is needed.
clear all; n = 6000; m = 5000; Xs = sprandn(n,m,1e2); X = full(Xs);
tic; [U,S] = svd(X,'econ'); ttoc=toc;
clear U S
fprintf('SDV Time %i Sec \n',ttoc);
tic; A = X'*X; [V,D] = eig(A); ttoc=toc; %faster, but may be less accurate
clear A %V D
fprintf('EIG Time %e Sec \n',ttoc);
p = 1; % Number of principle components to compute
tic; [coeff] = pca(X,'Centered',false, 'Algorithm', 'eig'); ttoc=toc;
fprintf('All PCA Time %e Sec, Error %e \n',ttoc,...
subspace(coeff(:,1:p),V(:,endp+1:end)));
tic; [coeffp] = pca(X,'Centered',false, 'NumComponents',p); ttoc=toc;
fprintf('Partial (%i principal component) PCA Time %e Sec, Error %e \n',p,ttoc,...
subspace(coeffp,V(:,endp+1:end)));
funA = @(v)((X*v)'*X)'; % using full datamatrix X
tic; [blockVectorX,lambda]=lobpcg(randn(m,p+p),@(v)funA(v),1e10,500); ttoc=toc;
fprintf('LOBPCG full data Time %e Sec, Error %e \n',ttoc,...
subspace(blockVectorX(:,1:p),V(:,endp+1:end)));
funAs = @(v)((Xs*v)'*Xs)'; % using sparse datamatrix Xs
tic; [blockVectorX,lambda]=lobpcg(randn(m,p+p),@(v)funAs(v),1e10,500); ttoc=toc;
fprintf('LOBPCG sparse data Time %e Sec, Error %e \n',ttoc,...
subspace(blockVectorX(:,1:p),V(:,endp+1:end)));
Andrew Knyazev (view profile)
To enable running the code in singleprecision, edit in a few spots:
blockVectorBX*spdiags(lambda,0,blockSize,blockSize);
into
single(double(blockVectorX)*spdiags(lambda,0,blockSize,blockSize));
and
blockVectorBX*spdiags(lambda,0,blockSize,blockSize));
into
single(double(blockVectorBX)*spdiags(lambda,0,blockSize,blockSize));
Andrew Knyazev (view profile)
"I encounter some severe differences in memory requirements comparing lobpcg.m with lobpcg from hypre/ij. Is there a way to use hypre with the same requirements as lobpcg.m?"
High memory requirements in hypre/ij in these tests come from hypre BoomerAMG preconditioning, not from the LOBPCG. For my full answer please see my reply at
http://www.mathworks.com/matlabcentral/answers/9534memoryrequirementsoflobpcgmatlabandhypreimplementationdifferences
Elias (view profile)
I encounter some severe differences in memory requirements comparing lobpcg.m with lobpcg from hypre/ij. Is there a way to use hypre with the same requirements as lobpcg.m? For full explanation see
http://www.mathworks.com/matlabcentral/answers/9534memoryrequirementsoflobpcgmatlabandhypreimplementationdifferences
Andrew Knyazev (view profile)
"Is there any reason why LOBPCG might not work for generalized eigenvalue problems with large sparse, symmetric matrices..."
Very slow convergence is an expected normal behavior of lobpcg for such a problem, without preconditioning. For detailed explanations and possible solutions, see
http://www.mathworks.com/matlabcentral/answers/9063lobpcgreturningincorrectresultsforlargesparsesymmetricmatrices
Andrew (view profile)
Is there any reason why LOBPCG might not work for generalized eigenvalue problems with large sparse, symmetric matrices (size 70 000 x 70 000, with 4.5 million nonzero values)? It has been very efficient for smaller identical problems (reducing the size of these sparse matrices to 10 000 x 10 000), although hasn't worked when I tried it on a problem of that size. In both cases I used a random matrix as an initial guess for the eigenvectors. see
http://www.mathworks.com/matlabcentral/answers/9063lobpcgreturningincorrectresultsforlargesparsesymmetricmatrices
Andrew Knyazev (view profile)
"One question: Is there any way to compute the lowest eigenpairs above a specific value, as in eigs one can choose a SIGMA to find eigenvalues around it?"
In eigs, the SIGMAoption actually solves the socalled "shiftandinvert" problem, see
http://en.wikipedia.org/wiki/Preconditioner#Spectral_transformations . In LOBPCG, this option is not directly supported, but can be implemented by a user, supplying the corresponding functions to LOBPCG.
Elias (view profile)
Nice work, really fast and very efficient. Since eigs crashes my cluster because of memory requirements, this seems to be a much better choice. One question:
Is there any way to compute the lowest eigenpairs above a specific value, as in eigs one can choose a SIGMA to find eigenvalues around it?
Thanks
Elias (view profile)