**You are now following this question**

- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.

# isqnonlin: compute part of objective function outside of matlab

12 views (last 30 days)

Show older comments

I am solving a partial differential equation depending on some design variables (material parameters). I want to fit the material parameters to the vector of experimental data y using matlabs function isqnonlin.

I am solving the PDE for given material parameters with an external software; the solution vector thus obtained is denoted as s. What I want to minimize is the squared difference between s and y in the l2-norm by using the algorithm implemented in isqnonlin.

Given that I do not compute the PDE solution s in matlab itself, is it possible to use isqnonlin?

### Accepted Answer

Torsten
on 12 Aug 2022

Edited: Torsten
on 12 Aug 2022

Then call "lsqnonlin" as

p0 = ...; % initial guess for the vector of material parameters

sol = lsqnonlin(@(p)fun(p,y),p0)

and write a function "fun" as

function res = fun(p,y)

s = result_from_your_PDE_solver_for_the_vector_of_material_parameters_p_corresponding_to_y(p)

res = s - y;

end

By the way:

The name of the MATLAB function is lsqnonlin (least-squares solver for nonlinear problems), not isqnonlin.

##### 134 Comments

SA-W
on 12 Aug 2022

Edited: SA-W
on 12 Aug 2022

SA-W
on 12 Aug 2022

SA-W
on 12 Aug 2022

I try to get the partials from the PDE solver using automatic differentiation, but I have to figure that out.

I will try to implement the above program next week.

One last thing, you said:

"Thus your pde solver will be called quite often, unfortunately."

If 's' has five components and 'p' stores three parameters, then in each iteration my pde solver is called 15 times, right?

Torsten
on 12 Aug 2022

If 's' has five components and 'p' stores three parameters, then in each iteration my pde solver is called 15 times, right?

My guess is four times:

First time: call with (p1,p2,p3) to get s(p1,p2,p3)

Second time: call with (p1+h,p2,p3) to get s(p1+h,p2,p3)

Third time: call with (p1,p2+h,p3) to get s(p1,p2+h,p3)

Fourth time: call with (p1,p2,p3+h) to get s(p1,p2,p3+h)

SA-W
on 12 Aug 2022

My pde solver is basically a c++ program. To call this program I will probably have to use the system() function and do something like

system(' ./my_pde_solver_program ') \\store the result belonging to s(p) in a file

s = load_the_file_which_stores_the_just_computed_s(p)

But now I am pondering about how to pass the material parameters 'p' to my c++ program, because command line parameters are passed as strings.

I do not know if this works with lsqnonlin, but my idea was to store the 'p' at the current iteration somewhere in a folder and my pde solver program reads that 'p' in. But lsqnonlin provides only the final solution for 'p' and not the 'p' at all the iterations, right?

Torsten
on 12 Aug 2022

SA-W
on 13 Aug 2022

Torsten
on 13 Aug 2022

I have no experience with communication between c++ codes and MATLAB.

Maybe you should open a new question for this topic.

SA-W
on 16 Aug 2022

@Torsten

I implemented the program and it works. However, lsqnonlin does not return:

I defined the vector p0 (10 parameters) in such a way that my pde solver reproduces the vector y (simulated experiment); this means s(p0)=y.

But lsqnonlin does not stop - my pde solver is called in an endless loop. I expected that my pde solver is maybe called 11 times and thats it. The program is currently running and my pde solver is already called 150 times. Do you have an idea what might be the problem?

Torsten
on 16 Aug 2022

Edited: Torsten
on 16 Aug 2022

Maybe perturb the y vector from the pde solver a bit in order not to start directly in the solution.

Write norm(res) to file during the optimization and see whether the value becomes smaller during the optimization process. And take care that the PDE integrator is very exact. Otherwise, the derivatives with respect to the material parameters will be too inaccurate to make progress.

SA-W
on 17 Aug 2022

It seems to work now: my pde solver was called 11 times and lsqnonlin returned "Initial point is a local minimum".

"And take care that the PDE integrator is very exact."

The vectors s and y store displacements, which range from -1 to 1. The components of p can take values between -1e+6 and 1e+6 - some of them are constrained.

By "very exact", do you mean that the s and y should store displacement data with a certain number of digits after the decimal point?

Torsten
on 17 Aug 2022

SA-W
on 17 Aug 2022

I stored p in a file using

writematrix(p, './p.txt');

Is that sensible to do it like this?

Torsten
on 17 Aug 2022

I'd take a look into the content ... Same for the file in which you write "y_simulated".

SA-W
on 17 Aug 2022

Torsten
on 17 Aug 2022

SA-W
on 17 Aug 2022

Thanks for all the hints. Unfortunately, I have no computational experience regarding that. Now, I store the displacement data with 18 digits after the decimal point and p with 10 digits.

However, I think a huge challange for me is to find an appropriate vector p0. As I said, it is not possible for me to make a guess how the final p could look like; I only know a certain functional structure.

For instance, in my simulated experiment where y(p0)=y_experiment I know that the first five components of p=p0 are: p(1)=60000, p(2)=12000, p(3)=0, p(4)=10744, p(5)=42000. So I know that the values left and right from p(3)=0 have to increase but not the order of magnitude. What I did now is to set p0(1)=500000, p0(2)=250000, p0(3)=0, p0(4)=250000, p0(5)=500000.

Is it possible at all for lsqnonlin to find the correct p if my p0 is so far away from it?

Torsten
on 17 Aug 2022

Is it possible at all for lsqnonlin to find the correct p if my p0 is so far away from it?

I doubt it. But: Versuch macht klug. And the main problem is that with such a big number of free parameters, there will be many combinations with approximately the same value for the objective function, but unphysical. The more constraints you can set, the less parameters you try to fit and the less these parameters are correlated (one parameter can compensate the influence of another), the better.

Did you already test whether you can recover your p0 starting thus far away from the solution ?

Another option is to use "MultiStart" in MATLAB, but it will be quite time-consuming in your case, I guess.

SA-W
on 17 Aug 2022

"And the main problem is that with such a big number of free parameters, there will be many combinations with approximately the same value for the objective function, but unphysical."

"Did you already test whether you can recover your p0 starting thus far away from the solution ?"

I tried it and it happened exactly what you said.

I set p0(1)=250000, p0(2)=187500, p0(3)=125000, p0(4)=62500, p0(5)=0, p0(6)=62500, p0(7)=125000, p0(8)=187500, p0(9)=250000. With that start values, lsqnonlin returns the solution p(1)=244537, p(2)=186631, p(3)=66641, p(4)=47163, p(5)=0, p(6)=46287, p(7)=64148, p(8)=194005, p(9)=243482

However, the p0 fulfilling y(p0)=y_experiment is p0(1)=374486, p0(2)=178626, p0(3)=71016. p0(4)=16342, p0(5)=0, p0(6)=14522, p0(7)=55572, p0(8)=120409, p0(9)=207186

Clearly, the returned solution is not the correct one. The solver also returned "local minimum possible". "lsqnonlin stopped because the final change in the sum of squares relative to its initial values is less than the value of function tolerance".

Based on the recommendation in the help menu, I restarted lsqnonlin with the returned solution, however, the result is the same.

Do you think it makes sense to scale p somehow?

Torsten
on 18 Aug 2022

Edited: Torsten
on 18 Aug 2022

Do you think it makes sense to scale p somehow?

Scaling is always a good idea, especially if the parameters to be determined have different orders of magnitude. But in your case, they all seem to be of the same order, appr. 1e5 ?

What was the norm of the residual for the solution (i.e. norm(y-y_exp)) lsqnonlin got with p(1)=244537, p(2)=186631, p(3)=66641, p(4)=47163, p(5)=0, p(6)=46287, p(7)=64148, p(8)=194005, p(9)=243482 ?

I guess for p0(1)=374486, p0(2)=178626, p0(3)=71016. p0(4)=16342, p0(5)=0, p0(6)=14522, p0(7)=55572, p0(8)=120409, p0(9)=207186 the residual was 0 ?

SA-W
on 18 Aug 2022

"Scaling is always a good idea, especially if the parameters to be determined have different orders of magnitude. But in your case, they all seem to be of the same order, appr. 1e5 ?"

As for the synthetic data I am currently working on, yes. But for the "real" data, the order of magnitude could also be 1e4 or 1e6.

"What was the norm of the residual for the solution (i.e. norm(y-y_exp)) lsqnonlin got with p(1)=244537, p(2)=186631, p(3)=66641, p(4)=47163, p(5)=0, p(6)=46287, p(7)=64148, p(8)=194005, p(9)=243482 ?"

The squared norm of the residual as returned by lsqnonlin is 2.2432e-6. The first order optimality measure is 1.8982e-6.

Should these values be even smaller in the global minimum?

"I guess for p0(1)=374486, p0(2)=178626, p0(3)=71016. p0(4)=16342, p0(5)=0, p0(6)=14522, p0(7)=55572, p0(8)=120409, p0(9)=207186 the residual was 0 ?"

Here the norm of the residual is 6.8716e-13 and the first order optimality measure is 1.42e-9.

That said, should I decrease the residual tolerance? Obviously, a change from 1e-6 to 1e-13 in the residual comes with large changes for p.

Torsten
on 18 Aug 2022

Edited: Torsten
on 18 Aug 2022

That a point thus far away from the solution also gives a very small residual seems to support my thesis that you have too many parameters and that a change in one parameter can be compensated by appropriate changes in the others (parameters are somehow correlated).

So again: you should try to reduce the number of parameters and constrain them as strict as possible to certain ranges (lower and upper bounds).

You can try (if it helps) to multiply the residuals by a big number or decrease the "FunctionTolerance" (since the residuals seem to be so small from their nature).

Maybe it also helps to increase "FiniteDifferenceStepSize" depending on how senitive y reacts to a change in the parameters.

Good luck!

SA-W
on 19 Aug 2022

Edited: SA-W
on 19 Aug 2022

Just one last thing pertaining to your comment:

"That a point thus far away from the solution also gives a very small residual seems to support my thesis that you have too many parameters and that a change in one parameter can be compensated by appropriate changes in the others (parameters are somehow correlated)."

So the gist of what you said is: if the parameters are somehow correlated, the residual can also be "small" for points which are far away from the solution? Why is that (if true at all)?

What I will definitely observe in the experiements is that the p's left and right from p(unknowm_index)=0 will increase (like I constructed it in the above examples). That said, the parameters will somehow be correlated - I can not avoid that. Reducing the p's is also difficult, because of convegence issues in my pde solver.

Torsten
on 19 Aug 2022

Edited: Torsten
on 19 Aug 2022

To put it easy:

Say you have two parameters p1 and p2 and the function y is computed as

y = (p1 + p2)*x

with the independent variable x.

Say the true solution is y = 20*x.

Then p1 and p2 are correlated because p1 = 20 - p2 and every combination (p1,p2) with p1+p2 = 20 gives a solution for your problem (e.g. p1= 20, p2 = 0 or p1 = 5689 and p2 = -5669).

This example is very obvious, but it follows a general concept: If one paramter (say p1) is reduced, an increase in the other parameter (p2) can compensate this and the resulting pair (p1,p2) still leads to an "optimal" solution.

That's what meant by my remark.

SA-W
on 19 Aug 2022

I see what you mean.

This will be tough to get this optimization problem running; My pde solver needs a certain number of p's (at least 9 like in the above example) to be able to compute a solution at all. So I can not go below 9 components.

An idea is to make the parameter estimation process iteratively like this:

What I know is that p(5)=0 (constraint) and that p(1)>p(2)>p(3)>p(4) and p(6)<p(7)<p(8)<p(9).

Calling lsqnonlin the first time, only p(6) and p(4) are free (p(5)=0 and the other p's are constant multiples of p(6) and p(4), respectively). Calling lsqnonlin the second time, p(4), p(5) and p(6) are const and p(7) and p(3) are free with lower bounds being the solution of the first call to lsqnonlin...

Does that makes sense according to you?

SA-W
on 19 Aug 2022

I will give it a try.

But the problems (too many parameters, correlation between them) remain the same.

SA-W
on 19 Aug 2022

One technical question:

I want to provide the Jacobian to Matlab. Say my pde solution has n rows and p has length 10, but one parameter p(5)=0 is constrained.

Given that, do I have to provide the partials F_i wrt p(5) for all i? Or, equivalently, should the Jacobian be a 'n x 10' matrix or a 'n x 9' matrix?

Bruno Luong
on 19 Aug 2022

n x 10, the usage of Jacobian for active/inactive constraints is handled by MATLAB

Torsten
on 19 Aug 2022

For fmincon, you have to provide the gradient, not the Jacobian.

Not that the objective function should return

sum_{i=1}^{i=n} (y(i;p1,...,p10) - yexp(i))^2

not

y(p1,...,p10) - yexp

SA-W
on 22 Aug 2022

I did a check on the gradient of my objective function using lsqnonlin.

options = optimoptions(@lsqnonlin, 'FunctionTolerance', 1e-10, 'OptimalityTolerance', 1e-8, 'FiniteDifferenceStepSize', 1e-10, 'CheckGradients', true, 'SpecifyObjectiveGradient', true);

Objective function derivatives:

Maximum relative difference between supplied

and finite-difference derivatives = 2.93146e-06.

Supplied derivative element (3,5): -1.82124e-06

Finite-difference derivative element (3,5): 1.11022e-06

CheckGradients failed.

Is there a way to get access to the entire Jacobian calculated by matlab using finite differences?

It seems that my provided Jacobian is at least in the correct order of magnitude, though, the largest normalized error is still big. I would like to figure out if my Jacobian is systematically wrong (wrong signs in all entries,...) by comparing it with the finite differences result.

SA-W
on 22 Aug 2022

I have to solve n linear systems in the last iteration of my pde, where n is the number of parameters in p.

Is that important how I get the Jacobian?

Bruno Luong
on 22 Aug 2022

Do you use the same discretizarion (mesh) between linerization model(s) and non linear model?

Bruno Luong
on 22 Aug 2022

Before you feed the Jacobian to lsqnonlin, you should check the correctness by finite difference on your side.

I recommend to check parameter by parameter (you have only 10), and NOT at the minmum but at the first guess x0.

SA-W
on 23 Aug 2022

Bruno Luong
on 23 Aug 2022

"why should I check the correctness on my side"

Because your computation of the Jacobian and it is obviously wrong or the step size you provide to lsqnonlin is inappropriate.

You have the direct model, call it twice then make a difference, how hard is it?

SA-W
on 25 Aug 2022

I double-checked the supplied Jacobian to Matlab and it should be correct. However, the gradient check still fails - even using the central method with several different step sizes. I also did a forward scheme by myself, which also reveals differences.

I omitted the gradient check and invoked lsqnonlin with my Jacobian and the start values we already talked about in this chat: in the first iteration norm(res) = 0.41 and after nine iterations norm(res)=0.006. Thence norm(res) does not change anymore and lsqnonlin returns because the StepTolerance is too small ( I already set it to 1e-10).

For comparison, I did the same calculation without feeding the Jacobian to lsqnonlin, i.e. using finite differences, but with the same tolerances (FunctionTolerance, OptimalityTolerance, StepTolerance). After 13 iterations, norm(res)=10^-6 with a final step size of about 1e-4.

The two solutions are far away from each other (10-20% difference); we already talked about reasons for that. But is it a well-known problem that, if the Jacobian is wrong at some places, norm(res) will not change anymore (close to a possible solution). Using finite differences, norm(res) was pushed several orders of magnitude lower. I am just asking myself whether a wrong Jacobian can lead the solver at all to a point with norm(res)=0.006 or not.

Bruno Luong
on 25 Aug 2022

Edited: Bruno Luong
on 25 Aug 2022

What you describe is typically happens when working with estimation of parameters with ODE/PDE. Usualy the Jacobian/Gradient can be derived from linearized equation and its adjoint. Many many people have tendency to discretize the continuous linearized equation, but typically it will provide a Jacobian/gradient that works on only few iterations, then the optimization algorithm just quit.

I'm not saying that your is this case since I don't have evidence without any in depth discussion on how you immplement the Jacobian calculation, but when I have worked on my thesis with the team specialized on solving this kind of problem, half of the graduate students fall into the same trap at first.

SA-W
on 25 Aug 2022

May I try to describe my calculation a little bit. I would appreciate your feedback.

I am solving for a displacement field u and the residual r associated to that pde is formulated in terms of an explicit and implicit dependency on my parameters p, that is, r = r(u(p), p) = 0. Differentiating with respect to p gives

dr / dp = \partial r / \partial u(p) * d u(p) /d p + \partial r / \partial p = 0

K * d u(p) /d p + \partial r / \partial p = 0

d u(p) /d p = -K^-1 * \partial r / \partial p --> solve n=size(p) linear systems after convergence

J =: d u(p) /d p is what I provide to Matlab.

"Many many people have tendency to discretize the continuous linearized equation, but typically it will provide a Jacobian/gradient that works on only few iterations, then the optimization algorithm just quit."

I think this is what I am doing. Is there an intuitive explanation why the optimization algorithm quits after a few iterations? Can you give me a reference/paper which describes that issue?

Bruno Luong
on 25 Aug 2022

I don't know many serious paper that describes accurately a method that doesn't work.

If you want to learn more, this kind of tool is usually used by our community

SA-W
on 26 Aug 2022

SA-W
on 26 Aug 2022

"Applying the method to N functions gives N gradients - thus the Jacobian."

And what are the N functions in this context?

SA-W
on 26 Aug 2022

I have to think about how this works for my pde.

The papers I read so far use either the finite difference scheme, the adjoint method, or AD. As for the adjoint method, they always compute the gradient and not the Jacobian.

Can you name more calculation schemes for the Jacobian to point my literature research in the right direction?

Torsten
on 26 Aug 2022

Edited: Torsten
on 26 Aug 2022

As for the adjoint method, they always compute the gradient and not the Jacobian.

As I wrote, computing gradients or Jacobians is equivalent. If you can do the first, you can do the last.

Can you name more calculation schemes for the Jacobian to point my literature research in the right direction?

Never heard of the "adjoint method" before - at least not under this name. I only know of finite difference approximation and automatic differentiation.

Although quite slow for larger systems, John D'Errico's code with error control might be worth to test:

Bruno Luong
on 26 Aug 2022

Edited: Bruno Luong
on 26 Aug 2022

In the paper, the Jacobian is what they calls "Tangent model". The Adjoint method is relevant for system where the dimension (discretized) state space or observation is large. People working in meteorology, climate are therefore very aware about such method.

You don't have to apply the adjoint method if your system is reasonably small and you are not bother about calculation time.

I gave thise reference mainly for the conclusion Section where @Simon Wiesheier can read about side effect of discretize linear system to compute the Jacobian.

SA-W
on 8 Sep 2022

function res = fun(p,y)

s = result_from_your_PDE_solver_for_the_vector_of_material_parameters_p_corresponding_to_y(p)

res = s - y;

end

This is the code stored in the answer to my question. Now, I am working with 3 experimental vectory 'y' (time-dependent problem).

The gradient vector of the objective function res with respect to the parameters p is calculated like this:

d(res)/d(p) = 2*(s1-y1)*J1 + 2*(s2-y2)*J2 + 2*(s3-y3)*J3

Is Matlab doing the pre-multiplications ( 2*(s1-y1),... ) of the Jacobians J automatically, such that it is enough to modify the code to this?

function res = fun(p,y)

s1 = result_from_pde_solver

s2 = result_from_pde_solver

s3 = result_from_pde_solver

res = (s1-y1 + s2-y2 +s3-y3);

end

In case I provide my own Jacobians to Matlab the code roughly looks like:

function [res, J] = fun(p,y)

s1 = result_from_pde_solver

J1 = result_from_pde_solver

s2 = result_from_pde_solver

J2 = result_from_pde_solver

s3 = result_from_pde_solver

J3 = result_from_pde_solver

res = (s1-y1 + s2-y2 +s3-y3);

end

What is the output J in that case?

I mean it makes no sense to return the sum J=J1+J2+J3. It also makes no sense to return all of them because Matlab does not know that J1 belongs to (s1-y1).

Bruno Luong
on 8 Sep 2022

Edited: Bruno Luong
on 8 Sep 2022

"s Matlab doing the pre-multiplications ( 2*(s1-y1),... ) of the Jacobians J automatically, such that it is enough to modify the code to this?

function res = fun(p,y)

s1 = result_from_pde_solver

s2 = result_from_pde_solver

s3 = result_from_pde_solver

res = (s1-y1 + s2-y2 +s3-y3);

end"

This is NOT equivalent to minimizing the norme of res = [(s1-y1); (s2-y2); (s3-y3)];

Since you could get very large individual residues (si-yi) yet the sum is small.

Torsten
on 8 Sep 2022

Edited: Torsten
on 8 Sep 2022

It doesn't matter how y is made up (if there are 3*n measurement data for 1 time vector or n measurements for 3 time vectors).

You choose a certain order of the y-vector and calculate a resulting vector s with your pde solver where s(i) is associated to y(i). It's up to you to decide whether it looks like

(s11 s21 s31 s12 s22 s32 s13 s23 s33 ...)

or

(s11 s12 s13 ... s21 s22 s23 ... s31 s32 s33 ...) .

Then

dres/dp = [d(s11)/dp1,d(s11)/dp2,...,d(s11)/dpn;d(s21)/dp1,d(s21)/dp2,...,d(s21)/dpn;...]

for the first ordering or

dres/dp = [d(s11)/dp1,d(s11)/dp2,...,d(s11)/dpn;d(s12)/dp1,d(s12)/dp2,...,d(s12)/dpn;...]

for the second ordering.

I don't understand how you come up with

d(res)/d(p) = 2*(s1-y1)*J1 + 2*(s2-y2)*J2 + 2*(s3-y3)*J3

You work with (s-y), not (s-y)^2.

SA-W
on 8 Sep 2022

"This is NOT equivalent to minimizing res = [(s1-y1); (s2-y2); (s3-y3)];"

I guess there was a misunderstanding; what I want to minimize is

res = (s1-y1 + s2-y2 +s3-y3).

Bruno Luong
on 8 Sep 2022

Moved: Bruno Luong
on 8 Sep 2022

"The gradient vector of the objective function res with respect to the parameters p is calculated like this:

d(res)/d(p) = 2*(s1-y1)*J1 + 2*(s2-y2)*J2 + 2*(s3-y3)*J3"

It seems you mix between gradient of the least-square objective and Jacobian of the model.

The above expression is the gradient

d(sum(res.^n 2))/d(p) = 2*(s1-y1)*J1 + 2*(s2-y2)*J2 + 2*(s3-y3)*J3"

The Jacobiabn is

d(res)/d(p) is [J1; J2; J3]

If you use fmincon and friends, gradient is needed, if you use lsqnonlin and friends Jacobian is needed.

SA-W
on 8 Sep 2022

If I get the gist of what you said then the way to minimize

f(p) = ||s1-y1||^2 + ||s2-y2||^2 + ||s3-y3||^2

is to write the three vectors s and y each in ONE single (column) vector:

s = (s1;s2;s3) y=(y1;y2;y3)

The same with the Jacobians (say the size of s and p is two):

J1 = [d(s11)/dp1 d(s11)/dp2 ; d(s12)/dp1 d(s12)/dp2]

J1 = [d(s21)/dp1 d(s21)/dp2 ; d(s22)/dp1 d(s22)/dp2]

J1 = [d(s31)/dp1 d(s31)/dp2 ; d(s32)/dp1 d(s32)/dp2]

maps into

J = [J1; J2; J3] =

[d(s11)/dp1 d(s11)/dp2 ; d(s12)/dp1 d(s12)/dp2;

d(s21)/dp1 d(s21)/dp2 ; d(s22)/dp1 d(s22)/dp2;

d(s31)/dp1 d(s31)/dp2 ; d(s32)/dp1 d(s32)/dp2]

Is that correct?

SA-W
on 8 Sep 2022

Moved: Bruno Luong
on 8 Sep 2022

Thank you!

I did not mix gradient and Jacobian, but (due to my lack of mathematical knowledge) I thought lsqnonlin computes the gradient internally based on the Jacobian.

Torsten
on 8 Sep 2022

what I want to minimize is

res = (s1-y1 + s2-y2 +s3-y3)

You shouldn't do that. Suppose s1-y1 = -20000, s2-y2 = 10000 and s3-y3 = 10000 after fitting.

Would you accept this fitting result ?

SA-W
on 8 Sep 2022

"You shouldn't do that. Suppose s1-y1 = -20000, s2-y2 = 10000 and s3-y3 = 10000 after fitting.

Would you accept this fitting result ?"

No, not really.

Wrting

min f(p) = ||s1-y1||^2 + ||s2-y2||^2 + ||s3-y3||^2

is the standard in (solid mechanics) literature. But, of course, what is meant by that is to minimize each of these summands. The correct way to do this is like I described it in my last comment, is it?

(here a copy of the comment)

" If I get the gist of what you said then the way to minimize

f(p) = ||s1-y1||^2 + ||s2-y2||^2 + ||s3-y3||^2

is to write the three vectors s and y each in ONE single (column) vector:

s = (s1;s2;s3) y=(y1;y2;y3)

The same with the Jacobians (say the size of s and p is two):

J1 = [d(s11)/dp1 d(s11)/dp2 ; d(s12)/dp1 d(s12)/dp2]

J1 = [d(s21)/dp1 d(s21)/dp2 ; d(s22)/dp1 d(s22)/dp2]

J1 = [d(s31)/dp1 d(s31)/dp2 ; d(s32)/dp1 d(s32)/dp2]

maps into

J = [J1; J2; J3] =

[d(s11)/dp1 d(s11)/dp2 ; d(s12)/dp1 d(s12)/dp2;

d(s21)/dp1 d(s21)/dp2 ; d(s22)/dp1 d(s22)/dp2;

d(s31)/dp1 d(s31)/dp2 ; d(s32)/dp1 d(s32)/dp2]

Is that correct?

"

Bruno Luong
on 8 Sep 2022

"what I want to minimize is

res = (s1-y1 + s2-y2 +s3-y3)"

Sure but we are telling you this is NOT equivalent to minimzing the norm of [(s1-y1);(s2-y2);(s3-y3)].

The (square) norm of the residual vector is not the SUM of the residual vector, it's the sum of the square. You can get the sum that is arbitrary small yet the norm is arbitrary large. You certainly don't want this solution for your problem.

Bruno Luong
on 8 Sep 2022

"s = (s1;s2;s3) y=(y1;y2;y3)

The same with the Jacobians (say the size of s and p is two):

J1 = [d(s11)/dp1 d(s11)/dp2 ; d(s12)/dp1 d(s12)/dp2]

J1 = [d(s21)/dp1 d(s21)/dp2 ; d(s22)/dp1 d(s22)/dp2]

J1 = [d(s31)/dp1 d(s31)/dp2 ; d(s32)/dp1 d(s32)/dp2]

maps into

J = [J1; J2; J3] =

[d(s11)/dp1 d(s11)/dp2 ; d(s12)/dp1 d(s12)/dp2;

d(s21)/dp1 d(s21)/dp2 ; d(s22)/dp1 d(s22)/dp2;

d(s31)/dp1 d(s31)/dp2 ; d(s32)/dp1 d(s32)/dp2]

Is that correct?"

Finally, this is correct.

The sum is INCORRECT.

Torsten
on 8 Sep 2022

Is that correct?

I think yes.

As I said:

You have a certain ordering for y (e.g. y = [y1; y2; y3] ) and you get back a corresponding s(p1,...,pn) of the same size as y. Then the Jacobian for lsqnonlin is

J = [ds1/dp1 ds1/dp2 ... ds1/dpn; ds2/dp1 ds2/dp2 ... ds2/dpn ; dsm/dp1 dsm/dp2 ...dsm/dpn]

where m is the length of the y-vector.

SA-W
on 9 Sep 2022

I get the exit message "optimization completed because the size of the gradient is less than the value of the optimality tolerance".

So does lsqnonlin compute the gradient internally?

Bruno Luong
on 9 Sep 2022

Edited: Bruno Luong
on 9 Sep 2022

The gradient of the least-squares objective is computed from the Jacobian

g = J'*J*residue:

Your job is to provide correctly and consistently functions that compute the residue (lsqnonlin first argument function handle)

residue = model(p)-data

and the Jacobian

J(i,j) = dresidue(i)/dp(j)= dmodel(i)/dp(j)

The rest let lsqnonlin do.

Torsten
on 9 Sep 2022

Edited: Torsten
on 9 Sep 2022

I get the exit message "optimization completed because the size of the gradient is less than the value of the optimality tolerance".

So does lsqnonlin compute the gradient internally?

Why do you always talk about "gradient" ? You must supply the Jacobian of "res", not the gradient of "sum(res.^2)".

SA-W
on 9 Sep 2022

" Why do you always talk about "gradient" ? You must supply the Jacobian of "res", not the gradient of "sum(res.^2)". "

I know. But in order to evaluate if the solution is a minimum, I have to check if the gradient components (first order optimality measure) is zero. The Jacobian does not give me this information directly. Right?

Bruno Luong
on 9 Sep 2022

Edited: Bruno Luong
on 9 Sep 2022

The output.firstorderopt of give you the first order optimality, so just check it.

But it you provide a wrong Jacobian (eg, doesn't pass gradient test) it just returns garbage.

If you converge to local minima then the first order optimility is small but the solution is not good.

Torsten
on 9 Sep 2022

Edited: Torsten
on 9 Sep 2022

I think you should trust "lsqnonlin" and its return code in this respect.

And as Bruno wrote:

The gradient of the least-squares objective is computed from the Jacobian

g = J'*J*residue

Since you can get the Jacobian as return parameter, you can calculate the gradient by the above formula.

SA-W
on 9 Sep 2022

"If you converge to local minima then the first order optimility is small but the solution is not good."

With 'the solution is not good' you mean that it is not the global minimum, right?

SA-W
on 9 Sep 2022

I reduced the number of parameters to one to debug my Jacobian. Both the supplied Jacobian and the finite diifference approach give my the correct solution for the parameter with nearly the same convergence behavior.

However, I want to validate my Jacobian with the GradientCheck -- which still fails.

options = optimoptions(@lsqnonlin, ...

'SpecifyObjectiveGradient', true,

'CheckGradients', true, 'FiniteDifferenceStepSize', 1, 'FiniteDifferenceType', 'central');

sol = lsqnonlin(@(p)fun(p,y_exp),1000,0,1e12, options);

I supplied a step size of one with central finite differences. The start value of my parameter is p0 = 1000. I wrote all values of p to a file whenever lsqnonlin cally 'fun'. Given a step size of one, I expected the file content to be the three values

1000; 999; 1001.

However, the file contains four values

1000; 999,081...; 0; 1998,162...

These values also change when I restart my program. A second run gave me, for instance,

1000; 1000,701...; 0; 2001,402...

How can that happen?

SA-W
on 12 Sep 2022

I will do that but, anyway, I would like to share the output with you

%experimental displacement

y_exp = importdata('./displacement_experiment/loadstep_10.txt');

%start value for kappa and bounds

kappa0 = 1e3;

lb = 0;

ub = 1e15;

options = optimoptions(@lsqnonlin, 'StepTolerance', 1e-10', 'FunctionTolerance', 1e-10, 'OptimalityTolerance', 1e-10, ...

'SpecifyObjectiveGradient', true, 'PlotFcn', 'optimplotresnorm', ...

'CheckGradients', true, 'FiniteDifferenceStepSize', 1e-1, 'FiniteDifferenceType', 'central');

[sol,resnorm,residual,exitflag,output,jacobian] = lsqnonlin(@(p)funJ(p,y_exp),kappa0,lb,ub, options);

function [f, J] = funJ(p,y_exp)

%write CURRENT p into a file for pde solver to read in

file_kappa = fopen('./kappa.txt', 'w');

fprintf(file_kappa, '%16.12f\n', p);

fclose(file_kappa);

%run pde solver and load solution and Jacobian

system('cd ..; make run')

y_sim = importdata('./displacement_simulation/loadstep_10.txt');

J = importdata('./jacobian/loadstep_10.txt');

%compute least square objective

f = y_sim - y_exp;

%write ALL p's at all iterations into a file for debugging purposes

file_debug = fopen('./kappa_debug.txt', 'a');

fprintf(file_debug, '%16.12f\n', p);

fclose(file_debug);

end

After running the above program the file 'kappa_debug.txt' stores the values

1000.000000000000

1000.804919190001

900.724427271001

1100.885411109001

Clearly, this sequence of values does not correspond at all to the step size of 1e-1 which I provided in the options.

Objective function derivatives:

Maximum relative difference between supplied

and finite-difference derivatives = 1.09343e-06.

Supplied derivative element (1,1): -9.41149e-06

Finite-difference derivative element (1,1): -1.05049e-05

I also double-checked that the "Supplied derivative element (1,1): -9.41149e-6" corresponds to p=1000.804919190001, i.e., not the start value but the second value from the above file; I thought the gradient check will compare the Jacobians at the start value.

Does the output make sense to you?

Bruno Luong
on 12 Sep 2022

Edited: Bruno Luong
on 12 Sep 2022

The FiniteDifferenceStepSize is relative(*) step size

format long

p=1000.804919190001

p =

1.000804919190001e+03

pr=p*(1+0.1)

pr =

1.100885411109001e+03

pl=p*(1-0.1)

pl =

9.007244272710009e+02

When I test the gradient I usualy use much smaller step size order or 10e-7 to 10e-10.

(*) From the doc:

Scalar or vector step size factor for finite differences. When you set FiniteDifferenceStepSize to a vector v, the forward finite differences delta are

delta = v.*signâ€²(x).*max(abs(x),TypicalX);

where signâ€²(x) = sign(x) except signâ€²(0) = 1. Central finite differences are

delta = v.*max(abs(x),TypicalX);

Scalar FiniteDifferenceStepSize expands to a vector. The default is sqrt(eps) for forward finite differences, and eps^(1/3) for central finite differences.

SA-W
on 12 Sep 2022

The calculation of pl and pr makes sense!

But how is the "new" start value calculated? In the example above p=1000.804919190001

Bruno Luong
on 12 Sep 2022

I don't know it probably some preprocessing to project the initial state with the constraints.

You don't need lsqnonlin to check the gradient. You can do the difference finite on your own. The Jacobian is likely wrong (the gradient is even reverse the sign), unless 0.1 step size is too large and/or your model is very non linear.

Torsten
on 12 Sep 2022

SA-W
on 12 Sep 2022

I have to calculate the Jacobian by my own because finite differences takes too long.

What I observe is that norm(res) converges quadratically in case of finite differences and only linear with my supplied Jacobian. This is another indication that my Jacobian is wrong, is it?

Torsten
on 12 Sep 2022

This is another indication that my Jacobian is wrong, is it?

Yes. All the optimization methods might also work with an approximate Jacobian (maybe wrong in your case), but they take more steps to converge.

I have to calculate the Jacobian by my own because finite differences takes too long.

But don't you also have to use finite differences ? Or do you use automatic differentiation ?

SA-W
on 12 Sep 2022

"Yes. All the optimization methods might also work with an approximate Jacobian (maybe wrong in your case), but they take more steps to converge."

In my one-parameter example, both finite differences and my own Jacobian result in the exact solution, however, only finite differences converges quadratically in the last iterations.

"But don't you also have to use finite differences ? Or do you use automatic differentiation ?"

Neither of them. Here again a copy of the comment in which I explaind how to calculate J.

"

I am solving for a displacement field u and the residual r associated to that pde is formulated in terms of an explicit and implicit dependency on my parameters p, that is, r = r(u(p), p) = 0. Differentiating with respect to p gives

dr / dp = \partial r / \partial u(p) * d u(p) /d p + \partial r / \partial p = 0

K * d u(p) /d p + \partial r / \partial p = 0

d u(p) /d p = -K^-1 * \partial r / \partial p --> solve n=size(p) linear systems after convergence when r=0

"

Bruno Luong
on 12 Sep 2022

Edited: Bruno Luong
on 12 Sep 2022

If one take it carefully, the linearized model (Jacobian) is only needed to be integrated once and the whole Jacobian can be computed in single shot.

This can be done with automatic differential tool or manually differentiate the model (my case long ago). The problem with third party software is that no full code is readily available, so automatic differential tool won't work.

SA-W
on 12 Sep 2022

You already gave me the hint

"Many many people have tendency to discretize the continuous linearized equation, but typically it will provide a Jacobian/gradient that works on only few iterations, then the optimization algorithm just quit."

This is exactly what I observe... What I do is to first discretize the weak form of my pde and then linearize it. Did you refer to that or do something else?

"If one take it carefully, the linearized model (Jacobian) is only needed to be integrated once and the whole Jacobian can be computed in single shot."

Manually differentiating the model is not that difficult in my case. But if one integrates J(i,j) = dresidue(i)/dp(j), the units are wrong. Or are there more steps involved?

Bruno Luong
on 12 Sep 2022

Sorry I don't know enough your implementation methodology to make any useful comment.

SA-W
on 12 Sep 2022

Sure. But may I ask what you meant by that?

"Many many people have tendency to discretize the continuous linearized equation, but typically it will provide a Jacobian/gradient that works on only few iterations, then the optimization algorithm just quit."

Does it refer to the issue ' discretization, then linearization vs. the other way round" or to something else?

SA-W
on 13 Sep 2022

[sol,resnorm,residual,exitflag,output,jacobian] = lsqnonlin(@(p)fun(p,y_exp),kappa0,lb,ub, options);

The output 'jacobian' is a struct with the fields lower and upper. I exptected it to be a real matrix as described in the documentation.

SA-W
on 13 Sep 2022

"The gradient of the least-squares objective is computed from the Jacobian

g = J'*J*residue "

I guess you meant

g = J'*residue , did you?

SA-W
on 13 Sep 2022

Yes.

I finally came up with the correct analytical derivative for the model with only one parameter. With that at hand, I have a good starting point where to search for errors for the larger model with more parameters

For the larger model, finite differences already works (gives me nearly the exact solution), however, norm(res) converges not quadratic -- not even linear:

norm(res): 0.000014 -> 0.0000040 -> 0.0000038 -> stop, because final change in sum of squares relative to its initial value is less than the value of the function tolerance.

Is this maybe caused by a poorly conditioned Jacobian? For the simple model, convergence is quadratic.

Torsten
on 13 Sep 2022

Edited: Torsten
on 13 Sep 2022

Shouldn't the parameters converge quadratically and not the residuals ?

Never thought about whether the function values of Newton's method converge quadratically, but why should they ?

Maybe J.'*J has not full rank in the solution which could make the parameters not converge quadratically ?

Bruno Luong
on 13 Sep 2022

Edited: Bruno Luong
on 13 Sep 2022

In the presence of noise, the norm of the residual is > 0, there is no convergence of the residual.

For Newton method, under some strict hypothesis, the convergence is quadratic meaning

| p_k - p_true | <= (K * | p_0 - p_true |) ^(2^k)

But lsqnonlin is not Newton method. It's a quasi-Newton, a hybrid between Newton and gradient method. Conjugate gradient method connvere linearly

| p_k - p_true | <= C^k * | p_0 - p_true |

with C ~ 1 - sqrt(cond(J'*J)).

In practice due to non-exact line search, the descend direction is not optimally estimated, and espetially for non-linear problems, most of the iterations are carried out in the regime BEFORE all the hypothesis where those nice theoretical convergence rate can be applied, unless one starts very close to the true solution.

It seems to me similar bound are applied on the first order as well, meaning

|g_k| <= C^k * | g_0 |

etc... The observation of g_k is the the most convenient since one doesn't need to know the true solution, and just observe the exponent of g_k decrease at constant rate to -infinity at the very end (only few last iterations then). If that is observed the algorithm would converge well.

SA-W
on 13 Sep 2022

As for the nonlinear pde I am solving with Newtons method, the convergence of the residual r(u) is definitely quadratic. I did not check if u itself converges quadratic, but probably it does.

"In the presence of noise, the norm of the residual is > 0, there is no convergence of the residual."

I have only synthetic data without noise, which is why I expected to achieve r=0.

"It seems to me similar bound are applied on the first order as well, meaning

|g_k| <= C^k * | g_0 "

Is this inequality associated with the Newton mehod, CG, or quasi Newton?

Bruno Luong
on 13 Sep 2022

Edited: Bruno Luong
on 13 Sep 2022

I'm talking about (Newton) method for minimizing the norm of the residual by lsqnonlin, not your non-linear PDE.

Q: "|g_k| <= C^k * | g_0| " Is this inequality associated with the Newton mehod, CG, or quasi Newton?

A: CG.

SA-W
on 14 Sep 2022

Edited: SA-W
on 14 Sep 2022

May I ask again for your opinion:

I am pretty sure that the calculation of my supplied Jacobian is now correct -- the gradient check works and also comparing norm(res) with FD over the iterations gives nearly the same values. However, I tried to run an example where, after some iterations, only finite differences continues:

norm(res) analytical Jacobian:

0.95663775628694

0.84086338912438

0.79471586926105

0.71994927130471

0.62150098491207

0.56915151393085

0.49998727669244

0.44724501123508

0.38919775862164

0.30224620636652

0.29155253514181

0.29463664342493

0.29662902334702

0.29613746460782

0.29601240801725

0.29146830530547

0.29197155676708

0.29157674620784

0.29149432809217

0.29147474333004

0.29146991056537

0.29146870635504

0.29146840555128

0.29146833036589

0.29146831157051

0.29146830687173

0.29146830569704

0.29146830540336

0.29146830532994

0.29146830531159

0.29146830530700

0.29146830530585

0.29146830530557

0.29146830530550

norm(res) finite differences:

0.956637756945

0.840863392139

0.794715873055

0.719949277385

0.621501138062

0.569152884902

0.499989625208

0.447247009461

0.389199788297

0.302248150577

0.292361086745

0.289960172256

0.219842464284

0.219554406578

0.011194862511

0.001210235899

0.000039441337

0.000007324984

0.000007324502

As you can see, the first 10 iterations are nearly equal, thence only FD makes further progress. Clearly, the Jacobian at the, say 10th iteration, has to be a different one.

My only explanation is that the least squares objective reaches a valley or some kind of discontinuity and finite differences can "jump" over that region by evaluating the pde outside of that region.

Can you imagine other problems to check for?

Bruno Luong
on 14 Sep 2022

No lsqnonlin doesn't make any jump, provide that the Jacobian is correct and the model is numerically differentiable.

However For example if your non-linear model is modeled by iterative method, e.g. Newton, then change on the number of iterations performed wil be seen as a Jump.

Bruno Luong
on 14 Sep 2022

SA-W
on 14 Sep 2022

"However For example if your non-linear model is modeled by iterative method, e.g. Newton, then change on the number of iterations performed wil be seen as a Jump. "

The number of iterations in my non-linear model is determined by the residual r(u). If r(u) is below 1e-7, the iteration stop, which requires about 3 or 4 iterations. Say lsqnonlin calls my pde solver for two different p vectors. Once my pde solver needs 3 iterations to bring the r(u) <1e-7 and in the second run 5 iterations. If you referred to that, why is this seen as a jump?

I observed another issue. In case of supplying my own Jacobian to lsqnonlin, I wrote the rank of both J and J^T* J into a file :

fileID = fopen('./rank.txt', 'a');

fprintf(fileID, '%d %d\n', rank(full(J)), rank(full(J'*J)));

J is a 3200x10 matrix. In the first iteration and later as well, rank(full(J)) = 10 and rank(full(J'*J)))=5 . How is this possible? It is known that rank(J) = rank(J^T *J) for any real matrix J.

Bruno Luong
on 14 Sep 2022

Edited: Bruno Luong
on 14 Sep 2022

"why is this seen as a jump? "

Any discrete dependency is (numerically) non diffenriable. You should make the number of iterations unchanged, even if it is overkilled.

rank(full(J'*J)))=5 How is this possible?

Why you think it is not possible that rank(J) is 5. And beside that rank is estimated by thresholding on the largest singular value. Compute rank(J'*J) can give numerical rank lower than that of rank(full(J)). The condition number of J"*J is tthe square of J.

SA-W
on 14 Sep 2022

"Any discrete dependency is (numerically) non diffenriable. You should make the number of iterations unchanged, even if it is overkilled."

I am pretty sure you are right, but I do not understand the gist behind it. In case of finite differences, Matlab calls my pde solver, say, two times with p and (p+h), my pde solver returns u(p) and u(p+h). Matlab finally computes (u(p+h)-u(p))./h . Why does it matter if u(p+h) required 5 Newton iterations and u(p) only 3? Matlab only sees the converged u's.

"Why you think it is not possible that rank(J) is 5. And beside that rank is estimated by thresholding on the largest singular value. Compute rank(J'*J) can give numerical rank lower than that of rank(full(J)). The condition number of J"*J is tthe square of J."

What I know from the literature is that rank(J)=rank(J' *J). That is why I am confused about

rank(full(J'*J))) = 5 ; rank(full(J)) = 10

I also computed

cond(full(J)) = 1.470330010022814e+12 ; cond(full(J'*J)) = 1.016539996652010e+20

These numbers indicate that my Jacobian is ill-conditioned, right?

Torsten
on 14 Sep 2022

For me, the results indicate that the parameters you try to estimate in your model are not independent.

Something like

y = p1/p2 * x

and you try to determine p1 and p2 although it is obvious that you can only estimate one parameter p as

y = p*x

Bruno Luong
on 14 Sep 2022

Bruno Luong
on 14 Sep 2022

Edited: Bruno Luong
on 14 Sep 2022

"These numbers indicate that my Jacobian is ill-conditioned, right?"

Or badly scaled.

The good way is instead of use lsq on model

res(p) = model(p) - data

You have to reformulate the original problem to

res2(q) = Sigma^-(1/2) (model(S*q) - data)

Where Sigma is the covariance matrix of the data and S the matrix such that

p = S*q, or

q := S^-1 * p

is unitless vector.

Bruno Luong
on 14 Sep 2022

" Why does it matter if u(p+h) required 5 Newton iterations and u(p) only 3? Matlab only sees the converged u's."

Because the model output by 4 iterations are slighly different than 5 iterations (that's why you iterate right?).

And when you switch between 2 different "trajectories" (the parameter is h) you have a function that is discontinuous and the numerical derivative is Ininity which is incompatible with the Jacobian you provide, that I suspect also is not correct or accurate despite what you claim.

Bruno Luong
on 14 Sep 2022

"Who can tell whether rank(J) = 10 if rank(J.'*J) = 5 (numerically) ?"

No, but the we can tell the opposite : if rank(J.'*J)=10 (Simon's information) then rank(J)>=10 and as there is 10 parameters it must be 10, so there is no dependent variable.

SA-W
on 14 Sep 2022

"No, but the we can tell the opposite : if rank(J.'*J)=10 (Simon's information) then rank(J)>=10 and as there is 10 parameters it must be 10, so there is no dependent variable."

No, in the first comment pertaining to the rank topic I said: rank(J.'*J)=5 and rank(J)=10.

Bruno Luong
on 14 Sep 2022

Edited: Bruno Luong
on 14 Sep 2022

OK I miss read, but if rank(J) = 10 then there is no dependent variable.

If you take a look of rank.m you'll see the threshold is eps(norm(s,inf)), which is more ore less correspond to correct estimate of rank for J with cond(J) <= 1/eps = 4.5e15.

Torsten
on 14 Sep 2022

@Simon Wiesheier comment moved here:

"Because the model output by 4 iterations are slighly different than 5 iterations (that's why you iterate right?).

And when you switch between 2 different "trajectories" (the parameter is h) you have a function that is discontinuous and the numerical derivative is Ininity which is incompatible with the Jacobian you provide, that I suspect also is not correct or accurate despite what you claim."

But the incremental changes in u at the 5th iteration are negligible (say 1e-10).

Example: u1(p+h) and u1(p) were obtained with 4 iterations each. --> J1 = (u1(p+h)-u1(p)) ./h

For comparison, u2(p) is also computed with 4 iterations (u2(p) = u1(p)) but u2(p+h) with 5 iterations (u2(p+h) = u1(p+h)+1e-10)). --> J2 = (u1(p+h) + 1e-10 -u1(p)) ./h

So J2 and J1 should nearly be the same, right?

Bruno Luong
on 14 Sep 2022

Edited: Bruno Luong
on 14 Sep 2022

if you change the number of iterations within h say 1e-16 (the change of iterations happens somewhere right?),

the finite difference estimates the directional derivative dmdd in the direction d is:

dmdd*h = (model_4(u+h*d) - model_5(u)) ~ +/-1e-10 % according to you

So

|dmdd| ~ |model_4(u+h*d) - model_5(u)| / h = 1e-10*1e16 = 1e6.

In fact I can make it arbitrary large (Inf indeed) as I can just be select h arbitrary small. When you have a sudden jump in the model, the numerical derivative is infinity. Not many gradient optimization algorithm can tolerate that.

SA-W
on 14 Sep 2022

Thank you.

I try to understand the first order optimality measure as defined in the doc:

For least-squares solvers and trust-region-reflective algorithms, in problems with bounds alone, the first-order optimality measure is the maximum over i of |vi*gi|. Here gi is the ith component of the gradient, x is the current point, and

vi = âˆ£xiâˆ’biâˆ£ if the negative gradient points toward bound bi, ONE otherwise.

I calculated the gradient g at the solution as g=2*jacobian'*residual

g =

1.0e-09 *

0.000000767068843

-0.002926767975382

0.010020383786136 fixed

-0.000000523224432

-0.002928045248151

0.647615092338367 fixed

-0.799689984542153 fixed

0.191622430726592 fixed

-0.047905605200400 fixed

0.007984268370923 fixed

I fixed six of the ten parameters, the remaining four have lower bound zero and upper bound infinity. The first order optimality measure is 5.690559991760299e-11. In case the negative gradient points toward bound bi, then vi would simply be the solution according to the definition (lower bound zero) and the first order optimality is the maximum of |gi*sol_i |.

sol =

1.0e+05 *

1.483715587103306

0.451988809262640

0

0.397818885189575

0.990392446191780

0

0.385839000000000

1.140300000000000

2.633770000000000

5.236680000000000

However, the pointwise product |gi sol_i | also does not include 5.690559991760299e-11. So how can I calculate the first order optimality in the above example?

Bruno Luong
on 14 Sep 2022

Tip; you have the source code of lsqnonlin; it ends up witj something like this (The inf norm of the projected gradient)

function firstOrderOpt = computeFirstOrderOpt(project,XOUT,gradF)

a = norm(project(XOUT,-gradF),Inf);

b = norm(gradF,Inf);

%When a==b, a^2 / b = a via cancellation (no active bounds)

%When a==b==0, a^2 / b = 0 via removable singularity (x at local unconstrained optimizer)

if abs(a-b) <= eps * max(a,b) || b == 0

firstOrderOpt = a;

else

firstOrderOpt = a^2 / b;

end

SA-W
on 14 Sep 2022

"Or badly scaled.

The good way is instead of use lsq on model

res(p) = model(p) - data

You have to reformulate the original problem to

res2(q) = Sigma^-1 (model(S*q) - data)

Where Sigma is the covariance matrix of the data and S the matrix such that

p = S*q, or

q := S^-1 * p

is unitless vector."

Can you reference a document which describes that scaling method in more detail?

Bruno Luong
on 14 Sep 2022

Edited: Bruno Luong
on 14 Sep 2022

I don't have any readily reference, so I just google (the topic is somewhat common knowledge, not particularly specific).

The "S" matrix is a very simplified precaution of not having a bad "right preconditioner", this is for linear system, but one get an idea for non-lear system where the matrix A in the wikipedia article is some sort of average Hessian of the (non linear) least-square objective function https://en.wikipedia.org/wiki/Preconditioner

The Sigma matrix is in page 25 of this pdf http://jitkomut.eng.chula.ac.th/ee531/ls.pdf (generalized least squares esimator) or any text on Kalman filtering.

You can view both as some "normalization" of observation (data) and parameter (p).

Now you might still get bad condition of J after the "normalization" because your system lacks of controlability of observability, depends on which side you look at the system. But is ths physics and you cannot do much anything againts this.

SA-W
on 15 Sep 2022

I am working with 4 parameters which I want to estimate. Depending on the start value, lsqnonlin gives me two different solutions with norm(res) ~ 1e-6. It seems that my problem has to little constraints to uniquely identify those parameters. But anyway I would like to show that these points are indeed local minima (by visualizing the least squares objective).

If it were only 2 parameters, I could make a surface plot of the least squares objective over an appropriate range of these parameters. But all 4 parameters have a high range of values, so I can not make a surface plot with 2 parameters for each combination of the remaining two -- that would be too many.

May I ask for an suggestion how you would work around with that?

Torsten
on 15 Sep 2022

Edited: Torsten
on 16 Sep 2022

SA-W
on 16 Sep 2022

All 4 plots have value zero at h=0 and positive values towards h=-1 and h=+1, respectively. It seems that these points are indeed local minima.

Getting a graphical representation of the least-squares objective is difficult with 4 parameters, right?

Bruno Luong
on 16 Sep 2022

Edited: Bruno Luong
on 16 Sep 2022

Unforunately even if you get 4 convex curve in 4 basic directions, it doesn't prove you are at the local minima as showed in this example:

f=@(x)x'*[1 -10; -10 1]*x;

e1 = [1; 0];

e2 = [0; 1];

d = [1; 1];

subplot(3,1,1); ezplot(@(h)arrayfun(@(h)f(h*e1),h));

subplot(3,1,2); ezplot(@(h)arrayfun(@(h)f(h*e2),h));

subplot(3,1,3); ezplot(@(h)arrayfun(@(h)f(h*d),h));

You have to compute the restricted gradient and verify it verified Lagrange condition and the computed Hessian verify it is pdf matrix.

Or just trust lsqninlin exit flag.

SA-W
on 16 Sep 2022

"Or just trust lsqninlin exit flag."

The exit flag is 3, so I should verify the identified point.

But if I get 4 convex curves in 4 basis directions, then the identified point has to be a saddle point, right?

"You have to compute the restricted gradient and verify it verified Lagrange condition and the computed Hessian verify it is pdf matrix."

What do you mean by restricted gradient?

SA-W
on 16 Sep 2022

Torsten
on 16 Sep 2022

SA-W
on 16 Sep 2022

"You have to compute the restricted gradient and verify it verified Lagrange condition and the computed Hessian verify it is pdf matrix. "

The Hessian can be approximated as

H = J'*J.

In my case, where J has condition numbers ~ 1e12 it probably makes no sense to check the definitness of H. The result would be to inaccurate, right?

Bruno Luong
on 16 Sep 2022

Edited: Bruno Luong
on 16 Sep 2022

You cannot check the local minima using J'*J by any mean, since it is (by construction) always non-negative symmetric matrix, even many algorithm rely on the approximation of J'*J to the Hessian.

Only the true Hessian can eventually reveal negative eigen value, if any.

IMO, sorry to tell you directly but it is not serious to work with J with condition number of 1e12, exitflag = 3 and gradient test fail. You must to make those numerical obstacles going away before conclude anything that is trustworty.

SA-W
on 16 Sep 2022

"IMO, sorry to tell you directly but it is not serious to work with J with condition number of 1e12, exitflag = 3 and gradient test fail. You must to make those numerical obstacles going away before conclude anything that is trustworty."

The gradient check does not fail anymore. Also, for example, a start vector

p0 = [100000;50000;0;50000;100000]

converges to the true exact solution

p = [21844;5183;0;4844;18939]

. The associated exitflag = 3, although it is the true solution (synthetic data).

You are absolutely right with the bad condition number of J. I am wondering anyway how such a Jacobian results in the correct solution?

I am not sure if I can scale my parameters, because these are material parameters and my pde solver does not converge at all if I change the order of magnitude of them somehow. Is there a way to scale the COMPUTED Jacobian to reduce the condition number or do I have to compute J with scaled parameters?

SA-W
on 17 Oct 2022