Source : https://github.com/lobpcg/blopex/ in blopex_tools/matlab/lobpcg/lobpcg.m
This main function LOBPCG is a version of the preconditioned conjugate gradient method (Algorithm 5.1) described in A. V. Knyazev, Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method, SIAM Journal on Scientific Computing 23 (2001), no. 2, pp. 517541. http://dx.doi.org/10.1137/S1064827500366124
A Cversion of this code is a part of the https://github.com/lobpcg/blopex
package and is available, e.g., in SLEPc and HYPRE. A scipy version is https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.lobpcg.html
Tested in MATLAB 6.57.13R2019a and available Octave 3.2.33.4.2.
Andrew Knyazev (2020). Locally Optimal Block Preconditioned Conjugate Gradient (https://github.com/lobpcg/blopex), GitHub. Retrieved .
4.17  Revision 4.17 adds support for single precision 

4.16  Revision 4.16 adds support for distributed or codistributed arrays available in MATLAB BigData toolbox, e.g.,: A = codistributed(diag(1:100)); B = codistributed(diag(101:200));


4.15  updated description on MathWorks, fixed Project Website link 

4.14  Linearly depended directions 1time restart, not failure: 1) Orthogonalization of directions P moved to a different spot.


1.6  Connected to GitHub 

1.5  added a toolbox format 

1.5.0.0  added a conversion to a toolbox 

1.5.0.0  A minor update. Functions can now be called using also function handles. Updated comments and examples. 

1.4.0.0  Editorial changes to make the code Octavecompatible. 

1.1.0.0  License update to free software (BSD). Comments update. 

1.0.0.0  minor update to remove mlint messages 

The first public release for generalized Hermitian eigenproblems. 

The final release for nongeneralized eigenproblems. 

modifying description 
Create scripts with code, output, and formatted text in a single executable document.
@Lee Hively There is no separate public generic Fortran 90 version to my knowledge, but there is a builtin code in opensource ABINIT https://www.abinit.org/
Andrew,
Is a FORTRAN90 version of LOBPCG available? If so, where can I get it?
Please respond to lee.hively314@comcast.net, since I seldom access the mathworks website.
Thanks, Lee Hively PhD
@Ioannis Koutis  thanks for using the code! Could you please give an example and specify your MATLAB version?
In MATLAB Version: 9.8.0.1380330 (R2020a) Update 2 the following runs fine for me:
A = delsq(numgrid('S',21)); n=size(A,1);
Afun = @(x)(A*x);
[blockVectorX,lambda,failureFlag]=lobpcg(randn(n,8),Afun,1e5,50,2);
The latest version cannot handle the case when A  the left side operator is given by a function handle. Previous versions worked correctly.
% Revision 4.17 adds support for single precision, e.g.,
% A = diag(1:100); B = single(diag(101:200));
% [blockVectorX,lambda]=lobpcg(randn(100,2),A,1e5,15,2);
% A = diag(1:100); B = diag(101:200);
% [blockVectorX,lambda]=lobpcg(randn(100,2,'single'),A,1e5,15,2);
Revision 4.16 adds support for distributed or codistributed arrays available in MATLAB BigData toolbox, e.g.,:
A = codistributed(diag(1:100)); B = codistributed(diag(101:200));
[blockVectorX,lambda]=lobpcg(randn(100,2),A,1e5,5,2)
The code is now distributed from https://github.com/lobpcg/blopex/ as blopex_tools/matlab/lobpcg/lobpcg.m under MIT License and Apache License 2.0
The code includes the license line
License: GNU LGPL ver 2.1 or above
which I intend to change into
MIT License and Apache License 2.0
as soon as I find out how.
I have added a few "gather" commands to this reference MATLAB code of LOBPCG so it also now runs with distributed or codistributed matrices, copy the modified code from the attachment to
https://www.mathworks.com/matlabcentral/answers/284759eigsinmultinodecluster#comment_599333
and run
if true
A = distributed(diag(1:100));
[blockVectorX,lambda,failureFlag]=lobpcg(randn(100,1),A,1e5,50,2)
end
LOBPCG is designed to be simple for distributed computing, and multiple implementations, including GPU, are already available for many years, e.g.,:
https://bitbucket.org/joseroman/blopex
https://github.com/trilinos/Trilinos/blob/master/packages/anasazi/src/AnasaziLOBPCG.hpp
http://slepc.upv.es/documentation/current/src/eps/impls/cg/lobpcg/lobpcg.c.html
https://github.com/NVIDIA/AMGX/blob/master/eigen_examples/LOBPCG
https://docs.abinit.org/variables/dev/#wfoptalg
http://octopuscode.org/wiki/Developers_Manual:LOBPCG
The provided reference MATLAB code can also be used, after technical modifications, for computing singular vectors or eigenvectors in the TALL ARRAY format  the new functionality not currently provided by MATLAB or any of the toolboxes. The modifications are necessary, because of some current TALL ARRAY limitations, e.g., tall(rand(10,2))\diag([1 2]) is not supported and must be substituted with tall(rand(10,2))*inv(diag([1 2])).
Please let me know if anyone cares about this functionally in MATLAB and wants me to modify the code accordingly to support the TALL ARRAY format for the eigenvectors in my LOBPCG code.
LOBPCG can be easily adopted to compute partial SVD and PCA for a data matrix X without ever computing its covariance matrix X'*X, i.e. in matrixfree fashion. The main calculation in LOBPCG is evaluation of a function of the product X'*(X*v) of the covariance matrix X'*X and the blockvector v. PCA needs the largest eigenvalues of the covariance matrix, while LOBPCG is typically implemented to calculate the smallest ones. A simple workaround is to negate the function, substituting X'*(X*R) for X'*(X*R) and thus reversing the order of the eigenvalues, since LOBPCG does not care if the matrix of the eigenvalue problem is positive definite or not.
A possibly competing alternative to LOBPCG is to try EIGS, but EIGS is not used in PCA, e.g., since its code cannot be embedded or work directly with tall arrays, while LOBPCG can do both being a pure MATLAB code.
LOBPCG also supports sparse data matrix X, in contrast to PCA. The example below demonstrates that LOBPCG starts outperforming SVD, EIG, PCA, and partial PCA even for full (nonsparse) data matrix X, for matrix sizes above 5,000, when only 1 principle component is needed.
clear all; n = 6000; m = 5000; Xs = sprandn(n,m,1e2); X = full(Xs);
tic; [U,S] = svd(X,'econ'); ttoc=toc;
clear U S
fprintf('SDV Time %i Sec \n',ttoc);
tic; A = X'*X; [V,D] = eig(A); ttoc=toc; %faster, but may be less accurate
clear A %V D
fprintf('EIG Time %e Sec \n',ttoc);
p = 1; % Number of principle components to compute
tic; [coeff] = pca(X,'Centered',false, 'Algorithm', 'eig'); ttoc=toc;
fprintf('All PCA Time %e Sec, Error %e \n',ttoc,...
subspace(coeff(:,1:p),V(:,endp+1:end)));
tic; [coeffp] = pca(X,'Centered',false, 'NumComponents',p); ttoc=toc;
fprintf('Partial (%i principal component) PCA Time %e Sec, Error %e \n',p,ttoc,...
subspace(coeffp,V(:,endp+1:end)));
funA = @(v)((X*v)'*X)'; % using full datamatrix X
tic; [blockVectorX,lambda]=lobpcg(randn(m,p+p),@(v)funA(v),1e10,500); ttoc=toc;
fprintf('LOBPCG full data Time %e Sec, Error %e \n',ttoc,...
subspace(blockVectorX(:,1:p),V(:,endp+1:end)));
funAs = @(v)((Xs*v)'*Xs)'; % using sparse datamatrix Xs
tic; [blockVectorX,lambda]=lobpcg(randn(m,p+p),@(v)funAs(v),1e10,500); ttoc=toc;
fprintf('LOBPCG sparse data Time %e Sec, Error %e \n',ttoc,...
subspace(blockVectorX(:,1:p),V(:,endp+1:end)));
To enable running the code in singleprecision, edit in a few spots:
blockVectorBX*spdiags(lambda,0,blockSize,blockSize);
into
single(double(blockVectorX)*spdiags(lambda,0,blockSize,blockSize));
and
blockVectorBX*spdiags(lambda,0,blockSize,blockSize));
into
single(double(blockVectorBX)*spdiags(lambda,0,blockSize,blockSize));
"I encounter some severe differences in memory requirements comparing lobpcg.m with lobpcg from hypre/ij. Is there a way to use hypre with the same requirements as lobpcg.m?"
High memory requirements in hypre/ij in these tests come from hypre BoomerAMG preconditioning, not from the LOBPCG. For my full answer please see my reply at
http://www.mathworks.com/matlabcentral/answers/9534memoryrequirementsoflobpcgmatlabandhypreimplementationdifferences
I encounter some severe differences in memory requirements comparing lobpcg.m with lobpcg from hypre/ij. Is there a way to use hypre with the same requirements as lobpcg.m? For full explanation see
http://www.mathworks.com/matlabcentral/answers/9534memoryrequirementsoflobpcgmatlabandhypreimplementationdifferences
"Is there any reason why LOBPCG might not work for generalized eigenvalue problems with large sparse, symmetric matrices..."
Very slow convergence is an expected normal behavior of lobpcg for such a problem, without preconditioning. For detailed explanations and possible solutions, see
http://www.mathworks.com/matlabcentral/answers/9063lobpcgreturningincorrectresultsforlargesparsesymmetricmatrices
Is there any reason why LOBPCG might not work for generalized eigenvalue problems with large sparse, symmetric matrices (size 70 000 x 70 000, with 4.5 million nonzero values)? It has been very efficient for smaller identical problems (reducing the size of these sparse matrices to 10 000 x 10 000), although hasn't worked when I tried it on a problem of that size. In both cases I used a random matrix as an initial guess for the eigenvectors. see
http://www.mathworks.com/matlabcentral/answers/9063lobpcgreturningincorrectresultsforlargesparsesymmetricmatrices
"One question: Is there any way to compute the lowest eigenpairs above a specific value, as in eigs one can choose a SIGMA to find eigenvalues around it?"
In eigs, the SIGMAoption actually solves the socalled "shiftandinvert" problem, see
http://en.wikipedia.org/wiki/Preconditioner#Spectral_transformations . In LOBPCG, this option is not directly supported, but can be implemented by a user, supplying the corresponding functions to LOBPCG.
Nice work, really fast and very efficient. Since eigs crashes my cluster because of memory requirements, this seems to be a much better choice. One question:
Is there any way to compute the lowest eigenpairs above a specific value, as in eigs one can choose a SIGMA to find eigenvalues around it?
Thanks