MATLAB Answers

Eric
0

Using fminunc() with very large scale problems

Asked by Eric
on 4 Oct 2013
Latest activity Commented on by Matt J
on 25 Oct 2013
I have delved into this topic before http://www.mathworks.com/matlabcentral/newsreader/view_thread/300661#811585, but am returning to it again.
I'm working with another class of algorithms that are very large scale. The class of algorithms is known as multi-frame super-resolution. A good reference is "Fast and Robust Multiframe Super Resolution" by S. Farsiu, et al, in IEEE Transactions on Image Processing, Vol 13, No. 10, 1327-1344 (2004).
My question is in regards to Matlab's ability to perform optimization on truly large scale problems. The authors of the aforementioned paper used steepest descent. I've coded up the algorithm with conjugate gradient (Polak-Ribiere).
I would be curious to try the trust-region algorithm of fminunc() on this problem. However, the size of the low resolution images I am using for super-resolution reconstruction is 135x90 (very small for real-world applications, but fine for testing). With 4X super-resolution the resultant image I'm reconstructing is 540x360. The algorithm considers every pixel of the output image to be a variable to be optimized. Thus there are 194,400 variables.
If I blindly use fminunc (letting it calculate the Hessian), it throws a memory error at a line in fminunc that reads
Hstr = sparse(ones(n));
n is apparently the number of variables, as in my case it is 194400.
Based on the aforementioned link, I tried setting 'HessMult', to @(H,Y)Y. The same memory error occurs at the same location.
Again, conjugate gradient techniques work well. I'm just surprised that I can't get this problem to work using a built-in function. It still seems that the Mathworks provides a function that either uses no derivative information (fminsearch) or an algorithm that requires calculating second order derivatives (fminunc). This is the second class of algorithms I've worked with in the last couple years that really needs an algorithm that can use first order derivatives but does not require second order derivatives.
I did put in a service request in 2011 to add the conjugate gradient algorithm to the optimization toolbox (or another algorithm that might provide comparable functionality). I pointed out a quote by Roger Fletcher in his "Practical Methods of Optimization", 2nd Edition (page 85): “Thus conjugate gradient methods may be the only methods which are applicable to large problems , that is problems with hundreds or thousands of variables.” The emphasis on “large problems” is Fletcher’s, not mine. I'm guessing my request didn't go very far.
I still wonder if there is some built-in function which will solve these problems and I'm just overlooking it. So my question is: Is there a Matlab-provided function that allows for optimization problems of this scale?
Thanks, Eric

  1 Comment

Here's the last email I received from my service request to add conjugate gradient. It tails off because of a 1500 character limit in what the Mathworks website will display (I think).
I discussed this issue with the development team. They had following comments:
The large scale FMINUNC algorithm uses a conjugate gradient method to determine the step that will be taken in each iteration of the algorithm. As you had pointed out, some of the implementations like Fletcher-Reeves or Polak-Ribiere algorithms, do not require estimation of the Hessian. However, these algorithms have a downside too in that sometimes they are less efficient and robust than other modern algorithms.
With regards to the large scale FMINUNC algorithm that is currently used in MATLAB, to calculate the 2-D subspace for the stepsize calculation, some information about the Hessian is required. This can be either an estimate of the Hessian itself or a Hessian-vector product. Hessian-vector products can be used if the Hessian has some structure and in these cases the complete Hessian does not need to be calculated. We have an example in the documentation that illustrates this (note that this example uses FMINCON, but it can also be applied to FMINUNC)
If it is not possible to specify Hessian-vector products, you can use the medium scale algorithm in FMINUNC. This algorithm builds up an approximation of the Hessian using a formula that just uses gradient information (i.e. the Hessian is not calculated directly). However, if you are solving large problem...

Sign in to comment.

1 Answer

Answer by Matt J
on 4 Oct 2013
Edited by Matt J
on 5 Oct 2013
 Accepted Answer

Based on the aforementioned link, I tried setting 'HessMult', to @(H,Y)Y. The same memory error occurs at the same location.
You must also set the 'Hessian' option parameter to 'on' in order for HessMult to be used. However, I recommend you try to find a HessMult function corresponding to a better approximation to the Hessian than eye(N). Otherwise, it's probably just going to boil down to steepest descent.
It still seems that the Mathworks provides a function that either uses no derivative information (fminsearch) or an algorithm that requires calculating second order derivatives (fminunc).
No. That's just not the case. As a further example, note that fminunc's quasi-Newton method with HessUpdate='steepdesc' is equivalent to steepest descent, which is a first order algorithm. It's not that these options are unavailable. They are simply not recommended.
Again, conjugate gradient techniques work well. I'm just surprised that I can't get this problem to work using a built-in function
If you're saying conjugate gradient works well for you without preconditioning , then that's good news, but it means you're lucky. You don't have a very hard problem to solve. If you are using pre-conditioning, then you are effectively using 2nd order derivative information (to some approximation). You just aren't using it in precisely the same way fminunc uses it.

  8 Comments

I set HessMult to '@(H,Y)Y' and Hessian to 'On' for fminunc().
This is not really a fair race. There are surely much better choices of HessMult that you could use. As I told you before, '@(H,Y)Y' probably makes the trust region algorithm reduce to something like steepest descent. Everyone knows that there is almost nothing slower than steepest descent.
I agree it's not really a fair race, but from my point of view I don't care. I'm sure there are much better choices for HessMult. But no one in the literature has ever published second-order derivative information and I'm not the right person to tackle that problem.
It still seems that the tools in the Optimization Toolbox are a bit lacking. If I don't have information on second order gradient information I'm forced to use fminunc() in this manner for large-scale problems. The code Matlab provides in the Optimization Toolbox is significantly outperformed by the freeware toolbox available from Sandia.
From the interactions I've had with the Mathworks it seems the official opinion is that the tools provided in the Optimization Toolbox are the "best" in some sense. Other algorithms are not included since they are not the "best". Hence algorithms such as conjugate gradient and BFGS are not included.
Other tools such as Python and Mathematica take a different approach. They include a wide range of optimization algorithms and rely on the user to pick the best for their problem. I tend to prefer that approach.
Best regards,
Eric
But no one in the literature has ever published second-order derivative information and I'm not the right person to tackle that problem.
Since you're computing your first derivatives, you should be able to compute the second derivative by differentiating one more time! For large problems like yours, a common approximation is to compute the Hessian diagonal D only and approximate as
HessMult=@(D,Y) D.*Y
It's not guaranteed to work for all problems, but it does for many, and shouldn't be worse than @(D,Y) Y.
From the interactions I've had with the Mathworks it seems the official opinion is that the tools provided in the Optimization Toolbox are the "best" in some sense... Hence algorithms such as conjugate gradient and BFGS are not included.
fminunc's quasi-newton method does have a BFGS and DFP option. I wouldn't disagree that CG would be a worthy addition to the toolbox. However, quasi-Newton methods are supposed to perform as least as well as CG (and methods that use second order derivs certainly so), so there is a grain of validity in their claim.

Sign in to comment.