Asked by Eric
on 4 Oct 2013

I have delved into this topic before http://www.mathworks.com/matlabcentral/newsreader/view_thread/300661#811585, but am returning to it again.

I'm working with another class of algorithms that are very large scale. The class of algorithms is known as multi-frame super-resolution. A good reference is "Fast and Robust Multiframe Super Resolution" by S. Farsiu, et al, in IEEE Transactions on Image Processing, Vol 13, No. 10, 1327-1344 (2004).

My question is in regards to Matlab's ability to perform optimization on truly large scale problems. The authors of the aforementioned paper used steepest descent. I've coded up the algorithm with conjugate gradient (Polak-Ribiere).

I would be curious to try the trust-region algorithm of fminunc() on this problem. However, the size of the low resolution images I am using for super-resolution reconstruction is 135x90 (very small for real-world applications, but fine for testing). With 4X super-resolution the resultant image I'm reconstructing is 540x360. The algorithm considers every pixel of the output image to be a variable to be optimized. Thus there are 194,400 variables.

If I blindly use fminunc (letting it calculate the Hessian), it throws a memory error at a line in fminunc that reads

Hstr = sparse(ones(n));

n is apparently the number of variables, as in my case it is 194400.

Based on the aforementioned link, I tried setting 'HessMult', to @(H,Y)Y. The same memory error occurs at the same location.

Again, conjugate gradient techniques work well. I'm just surprised that I can't get this problem to work using a built-in function. It still seems that the Mathworks provides a function that either uses no derivative information (fminsearch) or an algorithm that requires calculating second order derivatives (fminunc). This is the second class of algorithms I've worked with in the last couple years that really needs an algorithm that can use first order derivatives but does not require second order derivatives.

I did put in a service request in 2011 to add the conjugate gradient algorithm to the optimization toolbox (or another algorithm that might provide comparable functionality). I pointed out a quote by Roger Fletcher in his "Practical Methods of Optimization", 2nd Edition (page 85): “Thus conjugate gradient methods may be the only methods which are applicable to large problems , that is problems with hundreds or thousands of variables.” The emphasis on “large problems” is Fletcher’s, not mine. I'm guessing my request didn't go very far.

I still wonder if there is some built-in function which will solve these problems and I'm just overlooking it. So my question is: Is there a Matlab-provided function that allows for optimization problems of this scale?

Thanks, Eric

Answer by Matt J
on 4 Oct 2013

Edited by Matt J
on 5 Oct 2013

Accepted Answer

Based on the aforementioned link, I tried setting 'HessMult', to @(H,Y)Y. The same memory error occurs at the same location.

You must also set the 'Hessian' option parameter to 'on' in order for HessMult to be used. However, I recommend you try to find a HessMult function corresponding to a better approximation to the Hessian than eye(N). Otherwise, it's probably just going to boil down to steepest descent.

It still seems that the Mathworks provides a function that either uses no derivative information (fminsearch) or an algorithm that requires calculating second order derivatives (fminunc).

No. That's just not the case. As a further example, note that fminunc's quasi-Newton method with HessUpdate='steepdesc' is equivalent to steepest descent, which is a first order algorithm. It's not that these options are unavailable. They are simply not recommended.

Again, conjugate gradient techniques work well. I'm just surprised that I can't get this problem to work using a built-in function

If you're saying conjugate gradient works well for you without preconditioning , then that's good news, but it means you're lucky. You don't have a very hard problem to solve. If you are using pre-conditioning, then you are effectively using 2nd order derivative information (to some approximation). You just aren't using it in precisely the same way fminunc uses it.

Matt J
on 25 Oct 2013

I set HessMult to '@(H,Y)Y' and Hessian to 'On' for fminunc().

This is not really a fair race. There are surely much better choices of HessMult that you could use. As I told you before, '@(H,Y)Y' probably makes the trust region algorithm reduce to something like steepest descent. Everyone knows that there is almost nothing slower than steepest descent.

Eric
on 25 Oct 2013

I agree it's not really a fair race, but from my point of view I don't care. I'm sure there are much better choices for HessMult. But no one in the literature has ever published second-order derivative information and I'm not the right person to tackle that problem.

It still seems that the tools in the Optimization Toolbox are a bit lacking. If I don't have information on second order gradient information I'm forced to use fminunc() in this manner for large-scale problems. The code Matlab provides in the Optimization Toolbox is significantly outperformed by the freeware toolbox available from Sandia.

From the interactions I've had with the Mathworks it seems the official opinion is that the tools provided in the Optimization Toolbox are the "best" in some sense. Other algorithms are not included since they are not the "best". Hence algorithms such as conjugate gradient and BFGS are not included.

Other tools such as Python and Mathematica take a different approach. They include a wide range of optimization algorithms and rely on the user to pick the best for their problem. I tend to prefer that approach.

Best regards,

Eric

Matt J
on 25 Oct 2013

But no one in the literature has ever published second-order derivative information and I'm not the right person to tackle that problem.

Since you're computing your first derivatives, you should be able to compute the second derivative by differentiating one more time! For large problems like yours, a common approximation is to compute the Hessian diagonal D only and approximate as

HessMult=@(D,Y) D.*Y

It's not guaranteed to work for all problems, but it does for many, and shouldn't be worse than @(D,Y) Y.

From the interactions I've had with the Mathworks it seems the official opinion is that the tools provided in the Optimization Toolbox are the "best" in some sense... Hence algorithms such as conjugate gradient and BFGS are not included.

fminunc's quasi-newton method does have a BFGS and DFP option. I wouldn't disagree that CG would be a worthy addition to the toolbox. However, quasi-Newton methods are supposed to perform as least as well as CG (and methods that use second order derivs certainly so), so there is a grain of validity in their claim.

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 1 Comment

## Eric (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/89215-using-fminunc-with-very-large-scale-problems#comment_172374

Sign in to comment.