From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Pseudoinverse in MATLAB
Date: Thu, 21 Mar 2013 17:45:07 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 114
Message-ID: <kifgv3$h42$>
References: <> <khralp$bgq$> <khs4qd$li1$> <kht6nm$p1m$> <khtb4b$aqc$> <khvf73$o39$> <ki0nsg$q7e$> <ki1ogv$l9g$> <ki7d5a$r3s$> <ki8r3a$frd$>
Reply-To: <HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: 1363887907 17538 (21 Mar 2013 17:45:07 GMT)
NNTP-Posting-Date: Thu, 21 Mar 2013 17:45:07 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 4148121
Xref: comp.soft-sys.matlab:791714

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <ki8r3a$frd$>...
> "Daniel Vasilaky" wrote in message <ki7d5a$r3s$>...
> > "Greg Heath" <> wrote in message <ki1ogv$l9g$>...
> > Thank you all for your comments!
> > Is my algorithm a continuous function of the data for a fixed perturbation parameter? 
> > In theory, yes, and in practice one has to deal with round off error. 
> > Here is what the algorithm can do:
> > Let A be an m-by-n matrix where m >= n.  (for m=n A ill-conditioned, for m>n A’A ill-conditioned.)
> > Consider the equation Ax = b and let x_0 be some initial prior solution and c>0 a perturbation parameter in ||Ax – b|| + c||x – x_0||. 
> This can be solved be backslash.
>  [A; eye(size(x))] \ [b; c*x0]
> This penalization is well known. Among fields are climate and weather forecast.
> >The extended algorithm produces a sequence x_0, x_1, x_2, …….. which converges exponentially fast to the solution of
> >  Ax = b for m=n and for m > n  to the solution of A’Ax = A’b (normal equations) independent of the chosen initial values  x_0 and c > 0. 
> > Simple Tikhonov is a special case when x_0 is set to the zero vector, but zero is usually not the best choice in most applications. 
> > Of course a well-chosen x_0 and c > 0   will reduce the number of successive approximations needed, but the convergence is always guaranteed. Each successive approximation takes mn flops. A reasonable stopping rule for successive approximations is: “STOP as soon as the relative residual error is greater than the prior residual error, “ i.e. ,  stop if 
> > ||Ax_i – b||/||b|| > ||Ax_(i-1) – b||/||b||. 
> > This basically says STOP if round off error starts to make things worse. This may be an overkill because experiments show that one can stop much sooner. 
> Using gradient conjugate it probably converge even faster.
> > 
> > Loosely speaking, the proof and derivation of the above result reduces Richard Bellman’s functional recurrence equation to a recurrence equation of what appears to be a recurrence equation involving approximate Householder type projections, i.e., 
> >  (I – vv’/(c +v’v)), tempered by parameter c>0. The recurrence relation turns out to be a generalization of the Gram-Schmidt process similar to but not the same as the Householder orthogonalization.   This in turn produces an approximate QR factorization of A which solves the extended perturbed problem. The orthogonalization is done once which takes approximately 2mn^2 flops. 
> This is again a well-known stuffs.
> > It is possible to speed things up by Bjock’s modified Gram-Schmidt (MGS) trick.
> > The extended algorithm works well in my anomaly detection and reconstruction of anatomical image parts algorithms and has been accepted for publication in a biomedical engineering journal. But the above extended algorithm is only referenced there and not discussed in the publication. The derivation of the algorithm is quite tricky and the liner algebra is laborious but the convergence proof is not difficult. Not being from a math community I don’t know which journal to submit it to that may possibly accept it.  Any suggestions would be greatly appreciated. 
> Please make sure there is something really novel in your algorithm before working on the publication.
> Bruno

Bruno, thanks for the tip on  [A; eye(size(x,1))] \ [b; c*x_0]! Is there more information on this?

I ran extensive tests which show that for severely  ill-conditioned matrices my algorithm significantly outperforms both backslash \ and extended backslash 
[A; eye(size(x,1))] \ [b; c*x_0];  
However, for a sparse matrix, such as  the Harwell-Boeing test matrix "west0479"  with cond(A) =  1.4244e+12,  the simple backslash \  gives by far the best results. 
But even in  "west0479"  case my algorithm converges to the true solution, except why use it  if the backslash \ is much faster.

Here is a sample of test results: x is the actual solution, and x_0 initial solution, and x1 the computed solution.
In this example the x_0 = zero  vector and tried various parameter values from 
c=1e-3  to  c=1e-12. 

Test case hilb(25) ,  cond(A) =1.9684e+18, x= vector of 1s, rhs: b = A*x ;. 
Also tried  x_0 = x and got similar results.
  Successive approximations for extended backslash were looped based on: 
 x_n = [A; eye(size(x,1))] \ [b; c*x_n-1]
For hilb(25):
Simple backslash (\) :                                                                    
 max(norm(x-x1)) = 110.7277   

Extended backslash [A; eye(size(x,1))] \ [b; c*x_0]    0 iterations:      
 max(norm(x-x1)) = 2.6627  

Extended backslash [A; eye(size(x,1))] \ [b; c*x_0]  500 iterations:     
max(norm(x-x1)) = 2.6890

Extended backslash [A; eye(size(x,1))] \ [b; c*x_0]  5,000 iterations:   
max(norm(x-x1)) = 2.6890 (does not improve) 
My algorithm running  500 successive  approximations:                       
max(norm(x-x1)) =  0.0811

My algorithm running  1,000 successive approximations:                      
max(norm(x-x1)) =  0.0573

My algorithm running 5,000 successive approximations:                       
max(norm(x-x1)) =  0.0488

In the above example we can see that Extended backslash 
[A; eye(size(x,1))] \ [b; c*x_0] 
does not converge to the exact solution, it stays put after a few iterations.
On the other hand, my algorithm always converges, this turned out to be the case for other hilb(n) matrices and the higher the dimension n the greater outperformance  by my algorithm.

I also compared the performances on 
Discretization of a first-kind Fredholm integral equation with  kernel K and right-hand side g given by
K(s,t) = exp(s*cos(t)) ,  Y(s) = 2*sinh(s)/s , and with integration intervals  s in [0,pi/2] ,  t in [0,pi] and n=200 .
The solution is given by   x(t) = sin(t) .
 Reference: M. L. Baart, "The use of auto-correlation for pseudo-
 rank determination in noisy ill-conditioned linear least-squares
problems", IMA J. Numer. Anal. 2 (1982), 241-247.
Discretized by the Galerkin method with orthonormal box functions;
one integration is exact, the other is done by Simpson's rule.
Per Christian Hansen, IMM, 09/16/92.

con(A) = 1.6196e+18    x_0 = zero vector and 
c = 1e-09 ,  x= true solution, x1=computed solution

Simple backslash (\):                                                                        
max(norm(x-x1)) = 199.0953 
Extended backslash [A; eye(size(x,1))] \ [b; c*x_0]  0 itereations:         
max(norm(x-x1)) = 0.7065

Extended backslash [A; eye(size(x,1))] \ [b; c*x_0]  5,000 itereations:    
max(norm(x-x1)) = 0.7065 (not a typo)

My algorithm running 5,000 iterations:                                                    
max(norm(x-x1)) = 0.0369   (A 19 fold improvement)

Would these results be considered "novel" by the math community?

Thanks for your help,