UseParallel for hessian?

Will Matlab at some point support parallel computation of the finite difference Hessian? More specifically, I've been using UseParallel in my fminunc settings (which have a lot of parameters), but computing the Hessian takes a fair amount of time.

4 Comments

Same question here, have you figured it out?
What makes you suppose UseParallel applies to gradient, but not to Hessian computations?
the option description is "When true, fminunc estimates gradients in parallel." but that is gradients not hessian
I've stopped using Matlab but simply monitoring the system usage (# of cores) as well as the slowness were giveaways that it's not calculating in parallel

Sign in to comment.

Answers (1)

Matt J
Matt J on 2 Oct 2022
Edited: Matt J on 2 Oct 2022
I don't speak for MathWorks, but I think the issue is that finite difference Hessians are only relevant to the trust-region algorithm, since the quasi-newton algorithm does not use Hessian computations. But in the trust-region algorithm, the user is required to provide an analytical gradient computation via SpecifyObjectiveGradient=true. It seems a rather narrow use case that an analytical gradient calculation would be tractable, but not an analytical Hessian computation, assuming the memory footprint of such a matrix is not prohibitive. If the memory footprint of the Hessian is prohibitive, the user is meant to be use the HessianMultiplyFcn or HessPattern options.

10 Comments

Jonne Guyt
Jonne Guyt on 4 Oct 2022
Edited: Jonne Guyt on 4 Oct 2022
I do not think the case as narrow as described in your answer. I've stopped using Matlab in the meantime...
While not a drop-in replacement, one can use/tweak this function to calculate the hessian in parallel.
(simply omit/abort the hessian calculation during optimization and then run this function to calculate the hessian)
but using a quasi-newton algorithm (fminunc) as well as an interior-point (fmincon) will use finite differences to calculate the hessian and these are not fringe/edge cases.
No, fminunc's quasi-newton algorithm does not do a full Hessian computation. Only gradients are used.
fmincon's interior-point does have an option to compute the Hessian by finite differences but, similar to fminunc's trust-region-algorithm, it requires the user to supply an analytical gradient computation, which means the Hessian is likely to be analytically tractable as well (or at least I've yet to see a counter-example). So, I wonder why this option would ever be used.
No longer using Matlab and maybe I should've rephrased it, but I am pretty sure that not supplying Gradients and/or Hessians is relatively common with fmincon/fminunc (and it is by no means a requirement to provide them). In those cases (there are many cases in which they're a pain to calculate/provide) you are still stuck with this problem....
Matt J
Matt J on 4 Oct 2022
Edited: Matt J on 4 Oct 2022
As I've been saying, not supplying gradients is common, but not in the specific algorithms where full Hessians are used.
Another reason why finite difference Hessians may be discouraged is that the Hessian needs to be inverted, which can be sensitive to finite differencing errors.
Bruno Luong
Bruno Luong on 4 Oct 2022
Edited: Bruno Luong on 4 Oct 2022
@Jonne Guyt " but using a quasi-newton algorithm (fminunc) as well as an interior-point (fmincon) will use finite differences to calculate the hessian and these are not fringe/edge cases."
This statement is wrong. If the Hessian function is not provided by user, the quasi newton Hessian used by both algorithms is resulting from bookeeping of the gradients evaluated at different points. There is no need of finite difference on top of the gradient.
The doc mention the "sparse finite difference algorithm on the gradients" only performed in trust-region algorithm as Matt's correctly stated.
@Bruno Luong you're right.. my point was rephrased in my comment below it but I'll edit the post.
I think that the argument is going sideways and Matt J's comments are not helping users (other than saying "you shouldn't be in that situation"). The point is that if you use fminunc or fmincon and do not supply gradients/hessians, but do ask for the hessian to be calculated and if it is calculated via finite-differences, it is calculated without parallelizing. This use case is common within for example discrete choice modeling. The hessian is used to calculate the standard errors, so you do need it and there's no simple way to get it via alternative means (e.g., providing the gradient/hessian analytically).
The question was if this can be parallelized. Mathworks has not done so, the linked code in my post allows you to do so manually...
Matt J
Matt J on 4 Oct 2022
Edited: Matt J on 4 Oct 2022
if you use fminunc or fmincon and do not supply gradients/hessians, but do ask for the hessian to be calculated and if it is calculated via finite-differences.
That case does not exist in fmincon/fminunc. There is no fmincon/fminunc algorithm that performs a finite difference Hessian calculation when an analytical gradient is not provided.
So the only question is, do you know of a case where it would make sense to supply an analytical gradient, but not an analytical Hessian.
Matt J
Matt J on 4 Oct 2022
Edited: Matt J on 4 Oct 2022
This use case is common within for example discrete choice modeling. The hessian is used to calculate the standard errors, so you do need it and there's no simple way to get it via alternative means
If you need the Hessian for the purposes of computing standard errors (and not iterative optimization), then I agree it may make sense to have a parallelized finite differencer for that. However, it is not clear why that belongs in fminunc/fmincon. Because the Hessian is not being recomputed iteratively, you would use a standalone Hessian computing routine for that.
And furthermore the Hessian returned by minimization algorithms are usuallt NOT suitable to compute error standard deviations.
If you need the Hessian for the purposes of computing standard errors (and not iterative optimization), then I agree it may make sense to have a parallelized finite differencer for that. However, it is not clear why that belongs in fminunc/fmincon. Because the Hessian is not being recomputed iteratively, you would use a standalone Hessian computing routine for that.
Yes - this is exactly the use case. I think you inferred it was optimization, but I never mentioned this. If you request the hessian to be returned at the end (to compute SE's), it is approximated with finite differences, so it is 'built-in', but it might as well be standalone (hence my workaround). I hope it makes more sense now...
@Bruno Luong exact and approximated SE's may differ a small amount, but can you show me to where I can find that these are not suitable for discrete choice models with non-obvious gradients?

Sign in to comment.

Asked:

on 29 Jun 2018

Commented:

on 4 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!