Using fminunc() with very large scale problems

Question

Eric on 4 Oct 2013

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/89215-using-fminunc-with-very-large-scale-problems

Commented: Matt J on 25 Oct 2013

I have delved into this topic before http://www.mathworks.com/matlabcentral/newsreader/view_thread/300661#811585, but am returning to it again.

I'm working with another class of algorithms that are very large scale. The class of algorithms is known as multi-frame super-resolution. A good reference is "Fast and Robust Multiframe Super Resolution" by S. Farsiu, et al, in IEEE Transactions on Image Processing, Vol 13, No. 10, 1327-1344 (2004).

My question is in regards to Matlab's ability to perform optimization on truly large scale problems. The authors of the aforementioned paper used steepest descent. I've coded up the algorithm with conjugate gradient (Polak-Ribiere).

I would be curious to try the trust-region algorithm of fminunc() on this problem. However, the size of the low resolution images I am using for super-resolution reconstruction is 135x90 (very small for real-world applications, but fine for testing). With 4X super-resolution the resultant image I'm reconstructing is 540x360. The algorithm considers every pixel of the output image to be a variable to be optimized. Thus there are 194,400 variables.

If I blindly use fminunc (letting it calculate the Hessian), it throws a memory error at a line in fminunc that reads

Hstr = sparse(ones(n));

n is apparently the number of variables, as in my case it is 194400.

Based on the aforementioned link, I tried setting 'HessMult', to @(H,Y)Y. The same memory error occurs at the same location.

Again, conjugate gradient techniques work well. I'm just surprised that I can't get this problem to work using a built-in function. It still seems that the Mathworks provides a function that either uses no derivative information (fminsearch) or an algorithm that requires calculating second order derivatives (fminunc). This is the second class of algorithms I've worked with in the last couple years that really needs an algorithm that can use first order derivatives but does not require second order derivatives.

I did put in a service request in 2011 to add the conjugate gradient algorithm to the optimization toolbox (or another algorithm that might provide comparable functionality). I pointed out a quote by Roger Fletcher in his "Practical Methods of Optimization", 2nd Edition (page 85): “Thus conjugate gradient methods may be the only methods which are applicable to large problems , that is problems with hundreds or thousands of variables.” The emphasis on “large problems” is Fletcher’s, not mine. I'm guessing my request didn't go very far.

I still wonder if there is some built-in function which will solve these problems and I'm just overlooking it. So my question is: Is there a Matlab-provided function that allows for optimization problems of this scale?

Thanks, Eric

1 Comment
Show -1 older commentsHide -1 older comments

Eric on 4 Oct 2013

Edited: Eric on 4 Oct 2013

Here's the last email I received from my service request to add conjugate gradient. It tails off because of a 1500 character limit in what the Mathworks website will display (I think).

I discussed this issue with the development team. They had following comments:

The large scale FMINUNC algorithm uses a conjugate gradient method to determine the step that will be taken in each iteration of the algorithm. As you had pointed out, some of the implementations like Fletcher-Reeves or Polak-Ribiere algorithms, do not require estimation of the Hessian. However, these algorithms have a downside too in that sometimes they are less efficient and robust than other modern algorithms.

With regards to the large scale FMINUNC algorithm that is currently used in MATLAB, to calculate the 2-D subspace for the stepsize calculation, some information about the Hessian is required. This can be either an estimate of the Hessian itself or a Hessian-vector product. Hessian-vector products can be used if the Hessian has some structure and in these cases the complete Hessian does not need to be calculated. We have an example in the documentation that illustrates this (note that this example uses FMINCON, but it can also be applied to FMINUNC)

http://www.mathworks.com/help/toolbox/optim/ug/brn4nh7.html#f84047

If it is not possible to specify Hessian-vector products, you can use the medium scale algorithm in FMINUNC. This algorithm builds up an approximation of the Hessian using a formula that just uses gradient information (i.e. the Hessian is not calculated directly). However, if you are solving large problem...

Sign in to comment.

Sign in to answer this question.

Answer 1

Matt J on 4 Oct 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/89215-using-fminunc-with-very-large-scale-problems#answer_98677

Edited: Matt J on 5 Oct 2013

Based on the aforementioned link, I tried setting 'HessMult', to @(H,Y)Y. The same memory error occurs at the same location.

You must also set the 'Hessian' option parameter to 'on' in order for HessMult to be used. However, I recommend you try to find a HessMult function corresponding to a better approximation to the Hessian than eye(N). Otherwise, it's probably just going to boil down to steepest descent.

It still seems that the Mathworks provides a function that either uses no derivative information (fminsearch) or an algorithm that requires calculating second order derivatives (fminunc).

No. That's just not the case. As a further example, note that fminunc's quasi-Newton method with HessUpdate='steepdesc' is equivalent to steepest descent, which is a first order algorithm. It's not that these options are unavailable. They are simply not recommended.

Again, conjugate gradient techniques work well. I'm just surprised that I can't get this problem to work using a built-in function

If you're saying conjugate gradient works well for you without preconditioning , then that's good news, but it means you're lucky. You don't have a very hard problem to solve. If you are using pre-conditioning, then you are effectively using 2nd order derivative information (to some approximation). You just aren't using it in precisely the same way fminunc uses it.

8 Comments
Show 6 older commentsHide 6 older comments

Eric on 23 Oct 2013

Edited: Eric on 23 Oct 2013

Here are a couple tables I put together for typical runs. Some notes:

The top function (ConjugateGradient) is my own implementation of the Polak-Ribiere CG algorithm using a linesearch algorithm from Fletcher's "Practical Methods of Optimization" text.
The bottom three are functions from Poblano - one of which failed for some reason.
I set HessMult to '@(H,Y)Y' and Hessian to 'On' for fminunc().
The optimization had 194,400 variables.
My own ConjugateGradient() function does not keep track of the number of function evaluations.
My ConjugateGradient() algorithm and Poblano's ncg() with Polak-Ribiere differ in which linesearch algorithm is used.

For the first run I was optimizing a super-resolution reconstruction where the image shifts had errors. Only the pixel values are optimized, not the lateral shifts. Hence the final reconstruction is rather poor (the lateral shifts need to be optimized as well to get a decent reconstruction). Here are those results:

Next are some results where the lateral shifts have no error. In this case the final reconstructions are decent. I think the proper way to say this is that the curvature of the objective function value is greater in this case.

In the first case ncg and lbfgs from Poblano both converge to a considerably better answer in a comparable time to fminunc(). In the second case Poblano's lbfgs() still really performs well. fminunc() still really struggles.

Perhaps if I were more knowledgeable in this area I would know how to approximate the Hessian in fminunc() as is done in the BFGS algorithm. If there are other settings for fminunc() or a more intelligent way to use the tools in the Optimization Toolbox, I'd love to try those out.

Best regards,

Eric

Eric on 25 Oct 2013

I agree it's not really a fair race, but from my point of view I don't care. I'm sure there are much better choices for HessMult. But no one in the literature has ever published second-order derivative information and I'm not the right person to tackle that problem.

It still seems that the tools in the Optimization Toolbox are a bit lacking. If I don't have information on second order gradient information I'm forced to use fminunc() in this manner for large-scale problems. The code Matlab provides in the Optimization Toolbox is significantly outperformed by the freeware toolbox available from Sandia.

From the interactions I've had with the Mathworks it seems the official opinion is that the tools provided in the Optimization Toolbox are the "best" in some sense. Other algorithms are not included since they are not the "best". Hence algorithms such as conjugate gradient and BFGS are not included.

Other tools such as Python and Mathematica take a different approach. They include a wide range of optimization algorithms and rely on the user to pick the best for their problem. I tend to prefer that approach.

Best regards,

Eric

Matt J on 25 Oct 2013

Edited: Matt J on 25 Oct 2013

Open in MATLAB Online

But no one in the literature has ever published second-order derivative information and I'm not the right person to tackle that problem.

Since you're computing your first derivatives, you should be able to compute the second derivative by differentiating one more time! For large problems like yours, a common approximation is to compute the Hessian diagonal D only and approximate as

HessMult=@(D,Y) D.*Y

It's not guaranteed to work for all problems, but it does for many, and shouldn't be worse than @(D,Y) Y.

From the interactions I've had with the Mathworks it seems the official opinion is that the tools provided in the Optimization Toolbox are the "best" in some sense... Hence algorithms such as conjugate gradient and BFGS are not included.

fminunc's quasi-newton method does have a BFGS and DFP option. I wouldn't disagree that CG would be a worthy addition to the toolbox. However, quasi-Newton methods are supposed to perform as least as well as CG (and methods that use second order derivs certainly so), so there is a grain of validity in their claim.

Sign in to comment.

Using fminunc() with very large scale problems

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

8 Comments
Show 6 older commentsHide 6 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Using fminunc() with very large scale problems

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

8 Comments Show 6 older commentsHide 6 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

8 Comments
Show 6 older commentsHide 6 older comments