This functions implements a fast truncated SVD.
We often want to compute singular value decompositions. But most of the time, we actually don't need all the singular vectors/values as in Principal Components Analysis.
This is also justified by the fact that many matrices occuring in practice do exhibit some kind of structure that leads to only a few singular values actually being non-negligible.
Recent research has shown that when we want a truncated SVD, randomized algorithms can yield an incredible amount of acceleration.
* A : matrix whose SVD we want
* K : number of components to keep
* U,S,V : classical output as the builtin svd matlab function
Here is a small example for a 2000x2000 matrix of rank 100 on my small laptop computer:
>> A = randn(2000,100)*randn(100,2000);
>> tic; [U1,S1,V1] = svd(A); toc
Elapsed time is 6.509186 seconds.
>> tic; [U2,S2,V2] = rsvd(A,150); toc
Elapsed time is 0.238175 seconds.
So in that case a near 30x speed improvement. It just gets crazy when your matrix gets big...
You'll find more information about these breakthrough algorithms in a nice paper by Halko et al. : "finding structure through randomness"
On second thought, it seems plausible that there is no chance to approximate fully random matrix with a small k. You can see that the code works fine when k in near N/2. After all the paper of Halko et al. says "finding structure through randomness"...
Tested this code with a lot of different matrices, comparing to the standard svd(A, 'econ') or svds results, and also other independent randomized code (found at https://stats.stackexchange.com/questions/2806/best-pca-algorithm-for-huge-number-of-features-10k/11934), but I did not observe this kind of differences. However I could reproduce the results in this example also for the other randomized code.
Just some observations while playing with Dimitris' example:
- the difference between the two methods slightly decreases with number of k
- replacing the mat in Dimitris' code by mat = randn(2000,100)*randn(100,2000) produced excellent results. So this must be matrix dependent
- replacing mat by randn(dims, dims-900) is excellent, but by randn(dims, dims-800) produces the large deviations
Possibly some of the assumptions are violated with a square purely random matrix?
There is a big difference between the values recovered my svds for a limited number of singular values and rsvd?
dims = 100;
mat = randn(dims, dims);
numberOfSVs = 6;
[uMatlab, sMatlab, vMatlab] = svds(mat, numberOfSVs);
matlabToc = toc;
fprintf('matlabs toc: %7.5f\n', matlabToc)
[uRSVD, sRSVD, vRSVD] = rsvd(mat, numberOfSVs);
rsvdToc = toc;
fprintf('rsvd toc: %7.5f\n', rsvdToc)
Dear,What is the results difference between the original algorithm and the original algorithm?
Great Code . Reduced our time consumption from few hours to few seconds
I'm working on spliting an image into many small matrix, so very fast svd on small size matrix is what I need. This svd implementaion is lower than Matlab's svd when processing small size matrix. I don't try it on matrix with large size.
It is 500 faster than eig function for a 192000 x 192000 matrix, thanks a lot!
Thanks Adrien =)
Concerning the theoretical analysis, I just recommend the excellent overview paper by Halko et al "finding structure through randomness". This is just a plain implementation of one of the algorithms proposed there. Cheers
I just updated my rating,... this contribution is just great. I have used it on huge matrices (up to 1.5 GB) to compute up to 1500 singular values/vectors.
very nice,... it looks amazing on a few test cases.
Could you comment on the cases where this algorithm would fail or loose precision?
Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.