alternatives to gradient-based optimization algorithms like lsqnonlin that handle at least O(10) parameters?
Show older comments
I am fitting parameters of a PDE to experimental data utilizing the function lsqnonlin from the optimization toolbox.
I observed that I can successfully optimize 10,...,20 parameters but not more. My objective is, however, to optimize 50 to 100 parameters.
Based on my experience with lsqnonlin (and fmincon and friends,...), it seems that this class of optimization algorithms can handle a small number of parameters (10,...,20) well, but are not appropriate anymore if there are many more parameters.
Fast convergence or computation time are not important to me.
I am coming from an engineering background and do not have deep knowledge about other class of optimization algorithms. That said, I would appreciate if someone could give me some keywords or recommendations regarding alternative optimization algorithms, that are designed for handling a larger number of parameters.
Thank you!
18 Comments
I observed that I can successfully optimize 10,...,20 parameters but not more.
How did you observe this ? The optimizers are not limited in the number of fitting parameters they can manage. If fitting models with up to 100 parameters makes sense depends on the number and quality of your input data.
SA-W
on 24 Feb 2023
However, if I distribute 20 parameters over the same interval [a,b], lsqnonlin reidentifies some of the parameters wrong.
How have you assessed whether they are "wrong"? What is the final objective function value, and how does it compare to the objective value of the "right" parameters?
If both solutions have the same objective value (or nearly so), then they must both be considered correct.
SA-W
on 24 Feb 2023
Torsten
on 24 Feb 2023
However, the 20 parameters found by lsqnonlin do not define a convex function, but rather a function which is at certain ranges even decreasing, which, for whatever reasons, is also a minimum of the objective function.
Then you must somehow define convexity of the function as a constraint.
Matt J
on 24 Feb 2023
Is there a way to retranslate this constraints for lsqnonlin, for instance by adding a penalty term to
If your penalty is quadratic, then it can be implemented just by expanding the residual vector:
r=[weight1*r; weight2*constraint_violation];
f = ||r||^2
SA-W
on 24 Feb 2023
Do you think it is reasonable to enforce convexity in such a way or can you recommend a better solution?
Seems reasonable to me, but I would probably use fmincon instead of lsqnonlin, since then I can directly implement,
p(i-1)-2p(i)+p(i-1) >= 0
as a linear inequaltiy constraint.
Matt J
on 24 Feb 2023
If using lsqnon.in, I might instead have,
if p(i-1)-2p(i)+p(i-1) >= 0
c(i) = 0;
end
if p(i-1)-2p(i)+p(i-1) < 0
c(i) = abs(p(i-1)-2p(i)+p(i-1))^1.5;
end
since then c will be a differentiable function of p. lsqnonlin assumes c to be differentiable.
SA-W
on 25 Feb 2023
Matt J
on 25 Feb 2023
Yes, it is.
SA-W
on 25 Feb 2023
The abs in |t|^1.5 doesn't really matter if t^1.5 is only ever evaluated for positive t . As seen below, they are the same function with or without the abs. The differentiability follows from the fact that both the left and right hand derivatives of c(t)=max(0,|t|^1.5) are zero at t=0. Conversely, c(t)=max(0,t) has mismatched right and left derivatives at t=0.
syms t real
c1=piecewise(t<=0,0, t>0, abs(t)^1.5)
c2=piecewise(t<=0,0, t>0, t^1.5)
fplot(c1,[-1,1]);hold on
fplot(c2,[-1,1],'x');hold off
You could do that, or even have c(i) = (p(i-1)-2p(i)+p(i-1))^200. However, with larger exponents, the gradient of the cost function become small over a wider neighborhood of c=0. Smaller gradients means slower convergence, and also the chance that the OptimalityTolerance would trigger prematurely. Conversely, with smaller exponents (but stil greater than 1), the gradient becomes less smooth. So, there is a trade-off to be reckoned with.
Accepted Answer
More Answers (1)
The parameters that I am optimizing (the function values) have to represent a convex function in 1d.
Another strategy that I would suggest is to first fit a constrained curve to your function samples using this FEX download,
This tool will allow you both to constrain the fit to be convex (using the 'ConcaveUp' option), and also to compute the derivates of the fit. Once you have the curve fit and its derivates, you can substitute them into your ODE to obtain conventional equations in the original set of unknown parameters. You can then solve these equations using lsqnonin, or even a linear solver, depending on the form of your equations.
7 Comments
SA-W
on 26 Feb 2023
Matt J
on 27 Feb 2023
I don't know. We haven't seen a full problem description. What is the role of the ODE? Does it describe the 1D convex function or does it describe the displacement field? My ultimate point is that it would be wise to avoid mkaing the objective function depend on repeated calls to an ODE solver, if possible.
Matt J
on 27 Feb 2023
My suggestion is only relevant if you can do a spline fit to the thing your ODE is taking derivatives of.
SA-W
on 27 Feb 2023
Matt J
on 6 Mar 2023
I don't know enough about finite element analysis to know why you cannot fit a spline to a finite element field.
SA-W
on 6 Mar 2023
Categories
Find more on Nonlinear Least Squares (Curve Fitting) in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!