Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Non-linear optimization

Subject: Non-linear optimization

From: Toan Cao

Date: 4 Mar, 2013 17:37:08

Message: 1 of 33

Hi everyone,

I have a question relating to non-linear optimization and hope to receive your help!
If i have a cost function F(x) in general form. I mean it can not be described in the form of F(x)= f(x)'.f(x) (where f(x)' is transposition of f(x) ) as required by some methods such as Levenberg-Marquardt, Gauss-Newton for finding a local minimum value.
If i want to use above two methods, what should i do ?
Do you know any method that i can use in this case?

Thanks in advance,
Toan

Subject: Non-linear optimization

From: Steven_Lord

Date: 4 Mar, 2013 18:01:45

Message: 2 of 33



"Toan Cao" <toancv3010@gmail.com> wrote in message
news:kh2m44$4eh$1@newscl01ah.mathworks.com...
> Hi everyone,
>
> I have a question relating to non-linear optimization and hope to receive
> your help!
> If i have a cost function F(x) in general form. I mean it can not be
> described in the form of F(x)= f(x)'.f(x) (where f(x)' is transposition of
> f(x) ) as required by some methods such as Levenberg-Marquardt,
> Gauss-Newton for finding a local minimum value.
> If i want to use above two methods, what should i do ?
> Do you know any method that i can use in this case?

Do you have a MATLAB function that accepts a vector of parameters to be
optimized and returns the value of your cost function? If so, you're okay.
Most of the Optimization Toolbox and Global Optimization Toolbox solvers
don't care what you do inside your objective function to obtain the values
they seek (as long as your underlying function is smooth, for some of them)
as long as you return the cost function values from your objective function.

You could do calculations. You could retrieve information from disk. You
could query an instrument or database. You could even prompt the user to
INPUT the result of performing a physical experiment for a set of parameter
values.

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Non-linear optimization

From: Matt J

Date: 4 Mar, 2013 21:07:08

Message: 3 of 33

"Toan Cao" <toancv3010@gmail.com> wrote in message <kh2m44$4eh$1@newscl01ah.mathworks.com>...
> Hi everyone,
>
> I have a question relating to non-linear optimization and hope to receive your help!
> If i have a cost function F(x) in general form. I mean it can not be described in the form of F(x)= f(x)'.f(x) (where f(x)' is transposition of f(x) ) as required by some methods such as Levenberg-Marquardt, Gauss-Newton for finding a local minimum value.
> If i want to use above two methods, what should i do ?
===============

If you know a global lower bound on F(x), say F_low, then the minimization problem is equivalent to

 min f(x)'.f(x)

where

 f(x)=F(x)- f_low

So, you could apply Levenberg-Marquardt and/or Gauss-Newton to the reformulated problem.

Subject: Non-linear optimization

From: Toan Cao

Date: 5 Mar, 2013 03:36:06

Message: 4 of 33

"Steven_Lord" <slord@mathworks.com> wrote in message <kh2ni9$9ar$1@newscl01ah.mathworks.com>...
>
>
> "Toan Cao" <toancv3010@gmail.com> wrote in message
> news:kh2m44$4eh$1@newscl01ah.mathworks.com...
> > Hi everyone,
> >
> > I have a question relating to non-linear optimization and hope to receive
> > your help!
> > If i have a cost function F(x) in general form. I mean it can not be
> > described in the form of F(x)= f(x)'.f(x) (where f(x)' is transposition of
> > f(x) ) as required by some methods such as Levenberg-Marquardt,
> > Gauss-Newton for finding a local minimum value.
> > If i want to use above two methods, what should i do ?
> > Do you know any method that i can use in this case?
>
> Do you have a MATLAB function that accepts a vector of parameters to be
> optimized and returns the value of your cost function? If so, you're okay.
> Most of the Optimization Toolbox and Global Optimization Toolbox solvers
> don't care what you do inside your objective function to obtain the values
> they seek (as long as your underlying function is smooth, for some of them)
> as long as you return the cost function values from your objective function.
>
> You could do calculations. You could retrieve information from disk. You
> could query an instrument or database. You could even prompt the user to
> INPUT the result of performing a physical experiment for a set of parameter
> values.
>
> --
> Steve Lord
> slord@mathworks.com
> To contact Technical Support use the Contact Us link on
> http://www.mathworks.com
Thanks for your reply, Steve Lord.
Can i ask you more about this problem ?
As i know, Levenberg-Marquardt, Gauss-Newton methods use f(x) to calculate the Jacobian matrix. If i just have F(x) ( not f(x) ), how does optimization solvers compute this matrix? Can you explain for me the way that the solvers implement ?
Actually, i need to understand more deeply and hope to modify somethings for my specific function.
Thanks in advance.
Toan

Subject: Non-linear optimization

From: Toan Cao

Date: 5 Mar, 2013 03:42:10

Message: 5 of 33

"Matt J" wrote in message <kh32ds$hrn$1@newscl01ah.mathworks.com>...
> "Toan Cao" <toancv3010@gmail.com> wrote in message <kh2m44$4eh$1@newscl01ah.mathworks.com>...
> > Hi everyone,
> >
> > I have a question relating to non-linear optimization and hope to receive your help!
> > If i have a cost function F(x) in general form. I mean it can not be described in the form of F(x)= f(x)'.f(x) (where f(x)' is transposition of f(x) ) as required by some methods such as Levenberg-Marquardt, Gauss-Newton for finding a local minimum value.
> > If i want to use above two methods, what should i do ?
> ===============
>
> If you know a global lower bound on F(x), say F_low, then the minimization problem is equivalent to
>
> min f(x)'.f(x)
>
> where
>
> f(x)=F(x)- f_low
>
> So, you could apply Levenberg-Marquardt and/or Gauss-Newton to the reformulated problem.

Hi Matt J,
If i do not have lower bound of F(x), do you have another solution for my problem ?
I appreciate your support.
Best regards,
Toan

Subject: Non-linear optimization

From: Matt J

Date: 5 Mar, 2013 06:18:09

Message: 6 of 33

"Toan Cao" <toancv3010@gmail.com> wrote in message <kh3pii$ncd$1@newscl01ah.mathworks.com>...
>
> Hi Matt J,
> If i do not have lower bound of F(x), do you have another solution for my problem ?
==============

Show us F(x) in mathematical form. It shouldn't be that hard to come up with a lower bound.

Subject: Non-linear optimization

From: Bruno Luong

Date: 5 Mar, 2013 07:26:08

Message: 7 of 33

"Matt J" wrote in message <kh32ds$hrn$1@newscl01ah.mathworks.com>...
> "Toan Cao" <toancv3010@gmail.com> wrote in message
>
> If you know a global lower bound on F(x), say F_low, then the minimization problem is equivalent to
>
> min f(x)'.f(x)
>
> where
>
> f(x)=F(x)- f_low
>
> So, you could apply Levenberg-Marquardt and/or Gauss-Newton to the reformulated problem.

I don't think it is a good suggestion. (1) It make the code difficult to handle since f_low needs to be known. It squares the conditioning of the original problem, and thus all kinds of numerical difficulties become more prominent.

When levenberg-Markquardt or pseudo-newton method is developed, f(x) is usually taken as the local Jacobian of F. And this approximation is generally known and studied. The approximation can be applied on any F (only assumed to be differentiable).

Bruno

Subject: Non-linear optimization

From: Steven_Lord

Date: 5 Mar, 2013 15:18:13

Message: 8 of 33



"Toan Cao" <toancv3010@gmail.com> wrote in message
news:kh3p76$mhd$1@newscl01ah.mathworks.com...
> "Steven_Lord" <slord@mathworks.com> wrote in message
> <kh2ni9$9ar$1@newscl01ah.mathworks.com>...

*snip*

> As i know, Levenberg-Marquardt, Gauss-Newton methods use f(x) to calculate
> the Jacobian matrix. If i just have F(x) ( not f(x) ), how does
> optimization solvers compute this matrix? Can you explain for me the way
> that the solvers implement ?

I believe the documentation for Optimization Toolbox and Global Optimization
Toolbox describe the algorithms the functions in those two toolboxes use in
some detail.

> Actually, i need to understand more deeply and hope to modify somethings
> for my specific function.

Why? What's your application?

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Non-linear optimization

From: Toan Cao

Date: 5 Mar, 2013 17:44:07

Message: 9 of 33

"Steven_Lord" <slord@mathworks.com> wrote in message <kh52bl$jbp$1@newscl01ah.mathworks.com>...
>
>
> "Toan Cao" <toancv3010@gmail.com> wrote in message
> news:kh3p76$mhd$1@newscl01ah.mathworks.com...
> > "Steven_Lord" <slord@mathworks.com> wrote in message
> > <kh2ni9$9ar$1@newscl01ah.mathworks.com>...
>
> *snip*
>
> > As i know, Levenberg-Marquardt, Gauss-Newton methods use f(x) to calculate
> > the Jacobian matrix. If i just have F(x) ( not f(x) ), how does
> > optimization solvers compute this matrix? Can you explain for me the way
> > that the solvers implement ?
>
> I believe the documentation for Optimization Toolbox and Global Optimization
> Toolbox describe the algorithms the functions in those two toolboxes use in
> some detail.
>
> > Actually, i need to understand more deeply and hope to modify somethings
> > for my specific function.
>
> Why? What's your application?
>
> --
> Steve Lord
> slord@mathworks.com
> To contact Technical Support use the Contact Us link on
> http://www.mathworks.com
Hi Steve Lord,

I will explain more detail about my cost function and hope to receive your suggestion.
Given two 3D point clouds (source point cloud (SPC) and target point cloud (TPC)). I would like to move each point of SPC to be coincide with each corresponding point of TPC.
Each movement of each point of SPC is described by a Rotation matrix Ri and a translation vector Ti.
Rotation matrix Ri is constrained:
Rot(Ri)= (C1.C2)^2 + (C1.C3)^2 + (C2.C3)^2 +(C1.C1 -1)^2 +(C2.C2 -1)^2 + (C3.C3 -1)^2, where C1, C2, C3 are 3x1 column vectors of Ri.
Given m points in SPC, the first term of cost function is: Sum(Rot(Ri)) where i =1:m
If we call a point in SPC is Vi, its corresponding point in TPC is Ui, its transformed point is V'i. So, the second term of cost function is: Sum((V'i - Ui)^2), i=1:m
Finally, my cost function is F = Sum(Rot(Ri)) +Sum((V'i - Ui)^2), i=1:m

Now, i want to find all Ci of Rotation matrices Ri as well all elements of translation vectors Ti. What should i do to obtain local minimum value of this function ?
Looking forward to your reply.
Thanks in advance
Toan



 

Subject: Non-linear optimization

From: Matt J

Date: 6 Mar, 2013 14:59:09

Message: 10 of 33

"Toan Cao" <toancv3010@gmail.com> wrote in message <kh5at7$iv2$1@newscl01ah.mathworks.com>...
>
> I will explain more detail about my cost function and hope to receive your suggestion.
> Given two 3D point clouds (source point cloud (SPC) and target point cloud (TPC)). I would like to move each point of SPC to be coincide with each corresponding point of TPC.
> Each movement of each point of SPC is described by a Rotation matrix Ri and a translation vector Ti.
> Rotation matrix Ri is constrained:
> Rot(Ri)= (C1.C2)^2 + (C1.C3)^2 + (C2.C3)^2 +(C1.C1 -1)^2 +(C2.C2 -1)^2 + (C3.C3 -1)^2, where C1, C2, C3 are 3x1 column vectors of Ri.
> Given m points in SPC, the first term of cost function is: Sum(Rot(Ri)) where i =1:m
> If we call a point in SPC is Vi, its corresponding point in TPC is Ui, its transformed point is V'i. So, the second term of cost function is: Sum((V'i - Ui)^2), i=1:m
> Finally, my cost function is F = Sum(Rot(Ri)) +Sum((V'i - Ui)^2), i=1:m
>
> Now, i want to find all Ci of Rotation matrices Ri as well all elements of translation vectors Ti. What should i do to obtain local minimum value of this function ?
===============

The problem has a closed form solution, so iterative minimization is unnecessary. Here is one implementation

http://www.mathworks.com/matlabcentral/fileexchange/26186-absolute-orientation-horns-method

Subject: Non-linear optimization

From: Matt J

Date: 6 Mar, 2013 15:33:08

Message: 11 of 33

"Matt J" wrote in message <kh7ljt$oss$1@newscl01ah.mathworks.com>...
>
> > Now, i want to find all Ci of Rotation matrices Ri as well all elements of translation vectors Ti. What should i do to obtain local minimum value of this function ?
> ===============
>
> The problem has a closed form solution, so iterative minimization is unnecessary. Here is one implementation
>
> http://www.mathworks.com/matlabcentral/fileexchange/26186-absolute-orientation-horns-method

I assume you don't really mean that you're applying a different rotation matrix Ri and translation vector Ti to each point. That wouldn't make sense. A translation is sufficient to make 2 points coincide and is given simply by Ti=Ui-Vi. Adding rotation component would just create infinite non-unique solutions.

Subject: Non-linear optimization

From: Toan Cao

Date: 6 Mar, 2013 19:11:15

Message: 12 of 33

"Matt J" wrote in message <kh7njk$2io$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <kh7ljt$oss$1@newscl01ah.mathworks.com>...
> >
> > > Now, i want to find all Ci of Rotation matrices Ri as well all elements of translation vectors Ti. What should i do to obtain local minimum value of this function ?
> > ===============
> >
> > The problem has a closed form solution, so iterative minimization is unnecessary. Here is one implementation
> >
> > http://www.mathworks.com/matlabcentral/fileexchange/26186-absolute-orientation-horns-method
>
> I assume you don't really mean that you're applying a different rotation matrix Ri and translation vector Ti to each point. That wouldn't make sense. A translation is sufficient to make 2 points coincide and is given simply by Ti=Ui-Vi. Adding rotation component would just create infinite non-unique solutions.

Hi Matt J,
For my case, each point has its own transformation described by a rotation matrix and a translation vector. Actually, my cost function have another term (third term) which constrains the movement of neighbor points to be smooth (equation of third term is complex, it is not easy for me to write with simple texts).
I read a paper in finding minimum of this cost function. It mentioned that it applied Levenberg-Marquardt to obtain the minimum. Following this direction but i am now stuck in finding Jacobian for f(x), (where F(x) =f(x)'.f(x) ).
I think you are an expert in math, i wish you can give me some suggestions.
Thanks in advance,
Toan

Subject: Non-linear optimization

From: Matt J

Date: 6 Mar, 2013 20:00:08

Message: 13 of 33

"Toan Cao" <toancv3010@gmail.com> wrote in message <kh84cj$ge4$1@newscl01ah.mathworks.com>...
>
> Hi Matt J,
> For my case, each point has its own transformation described by a rotation matrix and a translation vector. Actually, my cost function have another term (third term) which constrains the movement of neighbor points to be smooth (equation of third term is complex, it is not easy for me to write with simple texts).
===================

Even with a 3rd smoothing term for neighboring points, it makes no sense that you would apply both a rotation and translation to a point. A translation is enough to specify the movement of a point to a different location. There's no reason to give the movement of the point 6 degrees of freedom when 3 are enough, and redundant parameters would ill-condition the optimization.


> I read a paper in finding minimum of this cost function. It mentioned that it applied Levenberg-Marquardt to obtain the minimum. Following this direction but i am now stuck in finding Jacobian for f(x), (where F(x) =f(x)'.f(x) ).
> I think you are an expert in math, i wish you can give me some suggestions.
=============

All of the cost function terms that you've shown us so far are square terms. Unless the 3rd term is not of this form, it's not clear what problem you have in writing your cost function as F(x) =f(x)'.f(x).

If the third term is not a sum of squares, we need to see it in order to know how to deal with it. I already gave you a suggestion, requiring a lower bound f_low (not necessarily a tight bound). Not showing us the full cost function makes it hard to advise you how to find this lower bound.

Also, as Bruno hinted, Levenberg-Marquardt is actually applicable to any twice differentiable cost function, not just those that are sums of squares. The parameter update step from x_n to x_{n+1} would be a solution to

 (Hessian(x_n) +lambda eye(N) )*x_{n+1}=-gradient(x_n)

However, you would have to code that yourself, including the adaptive selection of the lambda parameter.

Subject: Non-linear optimization

From: Matt J

Date: 6 Mar, 2013 20:39:10

Message: 14 of 33

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kh46mg$s0t$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <kh32ds$hrn$1@newscl01ah.mathworks.com>...
> > "Toan Cao" <toancv3010@gmail.com> wrote in message
> >
> > If you know a global lower bound on F(x), say F_low, then the minimization problem is equivalent to
> >
> > min f(x)'.f(x)
> >
> > where
> >
> > f(x)=F(x)- f_low
> >
> > So, you could apply Levenberg-Marquardt and/or Gauss-Newton to the reformulated problem.
>
> I don't think it is a good suggestion. (1) It make the code difficult to handle since f_low needs to be known. It squares the conditioning of the original problem, and thus all kinds of numerical difficulties become more prominent.
================

I'm sure it would have its drawbacks, but it's the only suggestion I can think of that doesn't require coding your own more general Levenberg-Marquardt routine. Also, the OP wanted to do Gauss-Newton as well. Gauss-Newton is only applicable to least squares cost functions, AFAIK, so some kind of square root factorization would have to be constructed for that.

I don't see finding f_low as a big problem. Remember, I'm not saying it has to be a tight lower bound, so some kind of term-by-term lower bound analysis on the cost function should give it to you.

As for the conditioning issue, an alternative factorization might be

 f(x)=(sqrt(F(x)-f_low))^2

If f_low is strictly less than the global minimum, then F(x)-f_low>0, so the sqrt shouldn't introduce any non-differentiabilities.

Subject: Non-linear optimization

From: Bruno Luong

Date: 6 Mar, 2013 21:19:09

Message: 15 of 33

"Matt J" wrote in message <kh89he$4al$1@newscl01ah.mathworks.com>...

> f(x)=(sqrt(F(x)-f_low))^2

What you write is

f(x) = abs(F(x)-f_low) = F(x)-f_low

Minimizing f(x) is the same as minimizing F(x). How far do we get?

Bruno

Subject: Non-linear optimization

From: Matt J

Date: 6 Mar, 2013 21:55:08

Message: 16 of 33

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kh8bsc$bov$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <kh89he$4al$1@newscl01ah.mathworks.com>...
>
> > f(x)=(sqrt(F(x)-f_low))^2
>
> What you write is
>
> f(x) = abs(F(x)-f_low) = F(x)-f_low
>
> Minimizing f(x) is the same as minimizing F(x). How far do we get?
===============

That should really have been

 f(x)=sqrt(F(x)-f_low)

 min (f(x))^2

But yes, the above is equivalent to minimizing F(x). That's what we want. Now, however, you can feed f(x) to LSQNONLIN and run its Levenberg-Marquardt routine.

Subject: Non-linear optimization

From: Bruno Luong

Date: 6 Mar, 2013 22:31:07

Message: 17 of 33

"Matt J" wrote in message <kh8dvs$imu$1@newscl01ah.mathworks.com>...

>
> But yes, the above is equivalent to minimizing F(x). That's what we want. Now, however, you can feed f(x) to LSQNONLIN and run its Levenberg-Marquardt routine.

And in what sense this contortion supposes to do any good?

The Gauss-Newton approximation of the square-root will certainly less accurate than working with F directly. Especially when the lower-bound is careless chosen.

Bruno

Subject: Non-linear optimization

From: Matt J

Date: 6 Mar, 2013 22:56:07

Message: 18 of 33

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kh8g3b$p6i$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <kh8dvs$imu$1@newscl01ah.mathworks.com>...
>
> >
> > But yes, the above is equivalent to minimizing F(x). That's what we want. Now, however, you can feed f(x) to LSQNONLIN and run its Levenberg-Marquardt routine.
>
> And in what sense this contortion supposes to do any good?
>
> The Gauss-Newton approximation of the square-root will certainly less accurate than working with F directly. Especially when the lower-bound is careless chosen.
=================

It was never really about doing any "good". It was about fulfilling the OP's request. :-)

The OP wanted a way to apply Gauss-Newton to this problem and I, at least, don't see any other way. Granted, Gauss-Newton would probably do badly with a poorly chosen lower bound, but with some tuning, a tighter bound might be identifiable. For example, one could update the optimization with increasing f_low until a solution with F(x)-f_low close to zero was found. The OP has also revealed that many terms in his cost function are already squares. It's only the non-square terms that require a lower-bound estimated, so that would probably help to make a tight f_low identifiable.

As for Levenber-Marquardt, it should still converge under this scheme, though perhaps slowly. Working with F directly would without question be better, but then you have to code it yourself. I have searched for general Levenberg-Marquardt routines on the FEX and didn't find them. :-(

Subject: Non-linear optimization

From: Bruno Luong

Date: 6 Mar, 2013 23:27:08

Message: 19 of 33

> As for Levenber-Marquardt, it should still converge under this scheme, though perhaps slowly. Working with F directly would without question be better, but then you have to code it yourself. I have searched for general Levenberg-Marquardt routines on the FEX and didn't find them. :-(

Not a generic LM, but rather than reinventing the wheel, one might save some work by changing the specific cost function and the Jacobian calculation of this code
 
http://www.mathworks.com/matlabcentral/newsreader/view_thread/281583

Admittedly it takes some minimum skill to make the Jacobian calculation correct and fast.

Bruno

Subject: Non-linear optimization

From: Matt J

Date: 7 Mar, 2013 02:05:08

Message: 20 of 33

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kh8jcc$4qj$1@newscl01ah.mathworks.com>...
> > As for Levenber-Marquardt, it should still converge under this scheme, though perhaps slowly. Working with F directly would without question be better, but then you have to code it yourself. I have searched for general Levenberg-Marquardt routines on the FEX and didn't find them. :-(
>
> Not a generic LM, but rather than reinventing the wheel, one might save some work by changing the specific cost function and the Jacobian calculation of this code
>
> http://www.mathworks.com/matlabcentral/newsreader/view_thread/281583
>
> Admittedly it takes some minimum skill to make the Jacobian calculation correct and fast.
===================

Bear in mind also that generic LM doesn't use the Jacobian of the cost function F(x), but rather the Hessian of F, or equivalently the Jacobian of F's gradient. This not only means a possibly intensive Hessian calculation, but also when you solve the update equation

  (Hessian(x)+eye(N)*lambda)*xnew = gradient(x)

you have to choose lambda so that (Hessian(x)+eye(N)*lambda) is positive definite. This can be difficult for non-convex F.

These considerations make me wonder whether applying LM to F(x) directly really is preferable.

Subject: Non-linear optimization

From: Bruno Luong

Date: 7 Mar, 2013 08:25:08

Message: 21 of 33

"Matt J" wrote in message <kh8skk$s6c$1@newscl01ah.mathworks.com>...

> Bear in mind also that generic LM doesn't use the Jacobian of the cost function F(x), but rather the Hessian of F, or equivalently the Jacobian of F's gradient. This not only means a possibly intensive Hessian calculation, but also when you solve the update equation

What you pointed out is the difference between Quasi-Newton and Newton. True both can be implemented with LM.

In practice quasi-Newton is quasi sufficient.

Bruno

Subject: Non-linear optimization

From: Toan Cao

Date: 7 Mar, 2013 15:15:08

Message: 22 of 33

"Matt J" wrote in message <kh8dvs$imu$1@newscl01ah.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kh8bsc$bov$1@newscl01ah.mathworks.com>...
> > "Matt J" wrote in message <kh89he$4al$1@newscl01ah.mathworks.com>...
> >
> > > f(x)=(sqrt(F(x)-f_low))^2
> >
> > What you write is
> >
> > f(x) = abs(F(x)-f_low) = F(x)-f_low
> >
> > Minimizing f(x) is the same as minimizing F(x). How far do we get?
> ===============
>
> That should really have been
>
> f(x)=sqrt(F(x)-f_low)
>
> min (f(x))^2
>
> But yes, the above is equivalent to minimizing F(x). That's what we want. Now, however, you can feed f(x) to LSQNONLIN and run its Levenberg-Marquardt routine.

Hi Matt J,

I will summarize my function here again and wish to receive feedback!

>Given two 3D point clouds (source point cloud (SPC) and target point cloud (TPC)). I would like to move each point of SPC to be coincide with each corresponding point of TPC.
Each movement of each point of SPC is described by a Rotation matrix Ri and a translation vector Ti.
Rotation matrix Ri is constrained:
Rot(Ri)= (C1.C2)^2 + (C1.C3)^2 + (C2.C3)^2 +(C1.C1 -1)^2 +(C2.C2 -1)^2 + (C3.C3 -1)^2, where C1, C2, C3 are 3x1 column vectors of Ri.
Given m points in SPC, the first term of cost function is: Sum(Rot(Ri)) where i =1:m
If we call a point in SPC is Vi, its corresponding point in TPC is Ui, its transformed point is V'i. So, the second term of cost function is: Sum((V'i - Ui)^2), i=1:m
--------------------------
I assume that (for simplicity) the third term for my cost function is Sum(norm(Ri-Rj)^2), i,j=1:m, i ~=j
Now, my cost functiion : F = Sum(Rot(Ri)) +Sum((V'i - Ui)^2) +Sum(norm(Ri-Rj)^2), i=1:m

I read document of optimization toolbox of matlab, It suggests that i can use LSQNONLIN for this function with Levenberg-Marquardt algorihm. With this routine, it requires we provide a vector-value function f(x)=[f{1}(x),f{2}(x),...,f{n}(x)]' for LSQNONLIN.
Now, to use this routine, i will do:
1) f{1}(x)= (C1{i}.C2{i}), f{2}(x)= (C1{i}.C3{i}),..., (a set of functions for first term).
     f{k}(x) =(V'{i} - U{j}), f{k+1}(x) =(V'{i+1} - U{j+1}),...., (a set of functions for second term).
     f{h}(x)= norm(R{i}-R{j}) {i=h},... ,f{m}(x)=norm(R{i}-R{j}) {i=m}. (a set of functions for third term).
=> So, f(x) = [f{1}(x), f{2}(x),..., f{k}(x), f{k+1}(x),..., f{h}(x),..., f{m}(x)]'

OR, i just give:
2) f{1}(x) = sqrt(Sum(Rot(Ri))), f{2}(x) = sqrt(Sum((V'i - Ui)^2)), f{3}(x) =sqrt(Sum(norm(Ri-Rj)^2)).
=> So, f(x) = [f{1}(x), f{2}(x), f{3}(x)]'
With your experience, Which option (1 or 2) should i follow ?

Thanks so much
Toan

Subject: Non-linear optimization

From: Matt J

Date: 7 Mar, 2013 20:02:06

Message: 23 of 33

"Toan Cao" <toancv3010@gmail.com> wrote in message <khaats$7h6$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <kh8dvs$imu$1@newscl01ah.mathworks.com>...
> > "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kh8bsc$bov$1@newscl01ah.mathworks.com>...
> > > "Matt J" wrote in message <kh89he$4al$1@newscl01ah.mathworks.com>...
> > >
> > > > f(x)=(sqrt(F(x)-f_low))^2
> > >
> > > What you write is
> > >
> > > f(x) = abs(F(x)-f_low) = F(x)-f_low
> > >
> > > Minimizing f(x) is the same as minimizing F(x). How far do we get?
> > ===============
> >
> > That should really have been
> >
> > f(x)=sqrt(F(x)-f_low)
> >
> > min (f(x))^2
> >
> > But yes, the above is equivalent to minimizing F(x). That's what we want. Now, however, you can feed f(x) to LSQNONLIN and run its Levenberg-Marquardt routine.
>
> Hi Matt J,
>
> I will summarize my function here again and wish to receive feedback!
>
> >Given two 3D point clouds (source point cloud (SPC) and target point cloud (TPC)). I would like to move each point of SPC to be coincide with each corresponding point of TPC.
> Each movement of each point of SPC is described by a Rotation matrix Ri and a translation vector Ti.
> Rotation matrix Ri is constrained:
> Rot(Ri)= (C1.C2)^2 + (C1.C3)^2 + (C2.C3)^2 +(C1.C1 -1)^2 +(C2.C2 -1)^2 + (C3.C3 -1)^2, where C1, C2, C3 are 3x1 column vectors of Ri.
> Given m points in SPC, the first term of cost function is: Sum(Rot(Ri)) where i =1:m
> If we call a point in SPC is Vi, its corresponding point in TPC is Ui, its transformed point is V'i. So, the second term of cost function is: Sum((V'i - Ui)^2), i=1:m
> --------------------------
> I assume that (for simplicity) the third term for my cost function is Sum(norm(Ri-Rj)^2), i,j=1:m, i ~=j
> Now, my cost functiion : F = Sum(Rot(Ri)) +Sum((V'i - Ui)^2) +Sum(norm(Ri-Rj)^2), i=1:m
>
> I read document of optimization toolbox of matlab, It suggests that i can use LSQNONLIN for this function with Levenberg-Marquardt algorihm. With this routine, it requires we provide a vector-value function f(x)=[f{1}(x),f{2}(x),...,f{n}(x)]' for LSQNONLIN.


> Now, to use this routine, i will do:
> 1) f{1}(x)= (C1{i}.C2{i}), f{2}(x)= (C1{i}.C3{i}),..., (a set of functions for first term).
> f{k}(x) =(V'{i} - U{j}), f{k+1}(x) =(V'{i+1} - U{j+1}),...., (a set of functions for second term).
> f{h}(x)= norm(R{i}-R{j}) {i=h},... ,f{m}(x)=norm(R{i}-R{j}) {i=m}. (a set of functions for third term).
> => So, f(x) = [f{1}(x), f{2}(x),..., f{k}(x), f{k+1}(x),..., f{h}(x),..., f{m}(x)]'
>
> OR, i just give:
> 2) f{1}(x) = sqrt(Sum(Rot(Ri))), f{2}(x) = sqrt(Sum((V'i - Ui)^2)), f{3}(x) =sqrt(Sum(norm(Ri-Rj)^2)).
> => So, f(x) = [f{1}(x), f{2}(x), f{3}(x)]'
> With your experience, Which option (1 or 2) should i follow ?
============

Toan,

I think neither option is a good one because, as I have already told you, it looks like the Ri parameters are unnecessary. Also, I can give you a solution right now that you can verify, by direct subsititution, will globally minimize the function you have written with F(Ri,Ti)=0. The solution is

 Ri=eye(3)
 Ti=Ui-Vi

for all i=1..m

Subject: Non-linear optimization

From: Toan Cao

Date: 7 Mar, 2013 20:35:08

Message: 24 of 33

"Matt J" wrote in message <kharnu$599$1@newscl01ah.mathworks.com>...
> "Toan Cao" <toancv3010@gmail.com> wrote in message <khaats$7h6$1@newscl01ah.mathworks.com>...
> > "Matt J" wrote in message <kh8dvs$imu$1@newscl01ah.mathworks.com>...
> > > "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kh8bsc$bov$1@newscl01ah.mathworks.com>...
> > > > "Matt J" wrote in message <kh89he$4al$1@newscl01ah.mathworks.com>...
> > > >
> > > > > f(x)=(sqrt(F(x)-f_low))^2
> > > >
> > > > What you write is
> > > >
> > > > f(x) = abs(F(x)-f_low) = F(x)-f_low
> > > >
> > > > Minimizing f(x) is the same as minimizing F(x). How far do we get?
> > > ===============
> > >
> > > That should really have been
> > >
> > > f(x)=sqrt(F(x)-f_low)
> > >
> > > min (f(x))^2
> > >
> > > But yes, the above is equivalent to minimizing F(x). That's what we want. Now, however, you can feed f(x) to LSQNONLIN and run its Levenberg-Marquardt routine.
> >
> > Hi Matt J,
> >
> > I will summarize my function here again and wish to receive feedback!
> >
> > >Given two 3D point clouds (source point cloud (SPC) and target point cloud (TPC)). I would like to move each point of SPC to be coincide with each corresponding point of TPC.
> > Each movement of each point of SPC is described by a Rotation matrix Ri and a translation vector Ti.
> > Rotation matrix Ri is constrained:
> > Rot(Ri)= (C1.C2)^2 + (C1.C3)^2 + (C2.C3)^2 +(C1.C1 -1)^2 +(C2.C2 -1)^2 + (C3.C3 -1)^2, where C1, C2, C3 are 3x1 column vectors of Ri.
> > Given m points in SPC, the first term of cost function is: Sum(Rot(Ri)) where i =1:m
> > If we call a point in SPC is Vi, its corresponding point in TPC is Ui, its transformed point is V'i. So, the second term of cost function is: Sum((V'i - Ui)^2), i=1:m
> > --------------------------
> > I assume that (for simplicity) the third term for my cost function is Sum(norm(Ri-Rj)^2), i,j=1:m, i ~=j
> > Now, my cost functiion : F = Sum(Rot(Ri)) +Sum((V'i - Ui)^2) +Sum(norm(Ri-Rj)^2), i=1:m
> >
> > I read document of optimization toolbox of matlab, It suggests that i can use LSQNONLIN for this function with Levenberg-Marquardt algorihm. With this routine, it requires we provide a vector-value function f(x)=[f{1}(x),f{2}(x),...,f{n}(x)]' for LSQNONLIN.
>
>
> > Now, to use this routine, i will do:
> > 1) f{1}(x)= (C1{i}.C2{i}), f{2}(x)= (C1{i}.C3{i}),..., (a set of functions for first term).
> > f{k}(x) =(V'{i} - U{j}), f{k+1}(x) =(V'{i+1} - U{j+1}),...., (a set of functions for second term).
> > f{h}(x)= norm(R{i}-R{j}) {i=h},... ,f{m}(x)=norm(R{i}-R{j}) {i=m}. (a set of functions for third term).
> > => So, f(x) = [f{1}(x), f{2}(x),..., f{k}(x), f{k+1}(x),..., f{h}(x),..., f{m}(x)]'
> >
> > OR, i just give:
> > 2) f{1}(x) = sqrt(Sum(Rot(Ri))), f{2}(x) = sqrt(Sum((V'i - Ui)^2)), f{3}(x) =sqrt(Sum(norm(Ri-Rj)^2)).
> > => So, f(x) = [f{1}(x), f{2}(x), f{3}(x)]'
> > With your experience, Which option (1 or 2) should i follow ?
> ============
>
> Toan,
>
> I think neither option is a good one because, as I have already told you, it looks like the Ri parameters are unnecessary. Also, I can give you a solution right now that you can verify, by direct subsititution, will globally minimize the function you have written with F(Ri,Ti)=0. The solution is
>
> Ri=eye(3)
> Ti=Ui-Vi
>
> for all i=1..m

Hi Matt J,

Thanks for your consideration!
Yes, i know what you mean in order to implement easily for my problem. However, in limited space of the text, i can not explain why i need rotation matrices to describe movements of points. It is reason i just explain main idea of the cost function.
If you would like to know more, i can show the paper i read and want to implement, its title: "Embedded deformation for shape manipulation". Its cost function is expressed in section 4.
So, when applying LSQNONLIN, i am confused which option (1 or 2) i use for f(x)=[f{1}(x),f{2}(x),...,f{n}(x)]' ?
I am willing to receive any your suggestion for my case.

Best regards,
Toan

Subject: Non-linear optimization

From: Matt J

Date: 7 Mar, 2013 20:48:08

Message: 25 of 33

"Toan Cao" <toancv3010@gmail.com> wrote in message <khatls$bf4$1@newscl01ah.mathworks.com>...
>
> Hi Matt J,
>
> Thanks for your consideration!
> Yes, i know what you mean in order to implement easily for my problem. However, in limited space of the text, i can not explain why i need rotation matrices to describe movements of points. It is reason i just explain main idea of the cost function.
> If you would like to know more, i can show the paper i read and want to implement, its title: "Embedded deformation for shape manipulation". Its cost function is expressed in section 4.
> So, when applying LSQNONLIN, i am confused which option (1 or 2) i use for f(x)=[f{1}(x),f{2}(x),...,f{n}(x)]' ?
> I am willing to receive any your suggestion for my case.
=======

Toan,

The square roots are bad, because they introduce non-differentiabilities into the cost function.

Subject: Non-linear optimization

From: Matt J

Date: 7 Mar, 2013 20:50:11

Message: 26 of 33

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <kh9it4$qg1$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <kh8skk$s6c$1@newscl01ah.mathworks.com>...
>
> > Bear in mind also that generic LM doesn't use the Jacobian of the cost function F(x), but rather the Hessian of F, or equivalently the Jacobian of F's gradient. This not only means a possibly intensive Hessian calculation, but also when you solve the update equation
>
> What you pointed out is the difference between Quasi-Newton and Newton. True both can be implemented with LM.
>
> In practice quasi-Newton is quasi sufficient.
======================

I'm not sure what link with quasi-Newton that you're referring to. If you're saying that

 (H+lambda*I) x=gradient

is an LM generalization of Newton's method, then yes, I'm sure Newton-LM would converge faster, however each iteration looks costly. You would have to know the minimum eigenvalue of H in order to make (H+lambda*I) positive definite.

 

Subject: Non-linear optimization

From: Bruno Luong

Date: 7 Mar, 2013 21:12:06

Message: 27 of 33

"Matt J" wrote in message <khaui3$efe$1@newscl01ah.mathworks.com>...

>
> I'm not sure what link with quasi-Newton that you're referring to. If you're saying that
>
> (H+lambda*I) x=gradient
>
> is an LM generalization of Newton's method,

Quasi-Newton -> Replace Hessian by an appoximation of it, usually based from the first derivative, such as BFGS formula or H ~ J'*J in the least square cost function, where J is the Jacobian of the model.

>then yes, I'm sure Newton-LM would converge faster, however each iteration looks costly. You would have to know the minimum eigenvalue of H in order to make (H+lambda*I) positive definite.

Both BFGS and J'*J approximation provide quasi convex quadratic approximation. Therefore there is no need to bother with such detail about positiveness.

Those notions are well known in optimization discipline.

Bruno

Subject: Non-linear optimization

From: Matt J

Date: 7 Mar, 2013 21:36:08

Message: 28 of 33

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <khavr6$imm$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <khaui3$efe$1@newscl01ah.mathworks.com>...
>
> >
> > I'm not sure what link with quasi-Newton that you're referring to. If you're saying that
> >
> > (H+lambda*I) x=gradient
> >
> > is an LM generalization of Newton's method,
>
> Quasi-Newton -> Replace Hessian by an appoximation of it, usually based from the first derivative, such as BFGS formula or H ~ J'*J in the least square cost function, where J is the Jacobian of the model.
>
> >then yes, I'm sure Newton-LM would converge faster, however each iteration looks costly. You would have to know the minimum eigenvalue of H in order to make (H+lambda*I) positive definite.
>
> Both BFGS and J'*J approximation provide quasi convex quadratic approximation. Therefore there is no need to bother with such detail about positiveness.
========

That much, I understand. Maybe I didn't understand what you meant by quasi-Newton being "quasi-efficient". It looks like finding lambda for quasi-Newton-LM would be much more efficient than for true Newton-LM.

Subject: Non-linear optimization

From: Bruno Luong

Date: 7 Mar, 2013 21:51:10

Message: 29 of 33

"Matt J" wrote in message <khb188$n3q$1@newscl01ah.mathworks.com>...

> That much, I understand. Maybe I didn't understand what you meant by quasi-Newton being "quasi-efficient".

Practical experimental shows that one does not need to compute the true Hessian. The gain in number of iteration for convergence is minor, yet more complex calculation for the second order requires a lot of overhead in many cases.

>It looks like finding lambda for quasi-Newton-LM would be much more efficient than for true Newton-LM.

Not sure I understand your statement. In LM method, finding lambda is based on the same empirical rules that works equally well regardless which the Hessian approximation or true Hessian is chosen.

Bruno

Subject: Non-linear optimization

From: Matt J

Date: 7 Mar, 2013 22:39:08

Message: 30 of 33

> >It looks like finding lambda for quasi-Newton-LM would be much more efficient than for true Newton-LM.
>
> Not sure I understand your statement. In LM method, finding lambda is based on the same empirical rules that works equally well regardless which the Hessian approximation or true Hessian is chosen.
=================

I don't see how that can be. For non-convex functions and non-posdef Hessians, an empirically chosen lambda could easily leave H+lambda*I singular, or at least, not positive definite and therefore non-descending. You would have to choose lambda>=min(eig(H)) to be sure that didn't happen, and that would require an eigen-analysis of H.

Subject: Non-linear optimization

From: Bruno Luong

Date: 7 Mar, 2013 22:52:07

Message: 31 of 33

"Matt J" wrote in message <khb4uc$55k$1@newscl01ah.mathworks.com>...

>
> I don't see how that can be. For non-convex functions and non-posdef Hessians, an empirically chosen lambda could easily leave H+lambda*I singular, or at least, not positive definite and therefore non-descending. You would have to choose lambda>=min(eig(H)) to be sure that didn't happen, and that would require an eigen-analysis of H.

The strategy and rules to chose lambda is given by LM algorithm. You can check the textbook that explains all the details.

Again all that is well known.

Bruno

Subject: Non-linear optimization

From: Matt J

Date: 8 Mar, 2013 05:18:07

Message: 32 of 33

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <khb5mn$7bg$1@newscl01ah.mathworks.com>...
> "Matt J" wrote in message <khb4uc$55k$1@newscl01ah.mathworks.com>...
>
> >
> > I don't see how that can be. For non-convex functions and non-posdef Hessians, an empirically chosen lambda could easily leave H+lambda*I singular, or at least, not positive definite and therefore non-descending. You would have to choose lambda>=min(eig(H)) to be sure that didn't happen, and that would require an eigen-analysis of H.
>
> The strategy and rules to chose lambda is given by LM algorithm. You can check the textbook that explains all the details.
>
> Again all that is well known.
===================

I've gone back to 2 textbooks now, Bertsekas and Nocedal&Wright. They both deal with LM strictly in the context of Gauss-Newton approximations for the Hessian in nonlinear least squares. I don't get the sense that LM for true Hessians has been explored all that extensively.

Neither book, incidentally, spells out the lambda-tuning procedure in great detail.
Nocedal and Wright in fact, offer a non-empirical alternative, derived from trust region ideas. Again, though, that's applicable only to Gauss-Newton LM.

One can see that the empirical lambda-tuning rules in the original LM papers should in theory be applicable to true Hessians and not require an eig(H) operation. However, it's still easy to imagine that if the algorithm lands in a non-convex region where H is not positive definite, that you might have to solve

  (H+lambda*I)*x=-g

for several lambda before a descent-direction was found.

Subject: Non-linear optimization

From: Bruno Luong

Date: 8 Mar, 2013 07:22:10

Message: 33 of 33

"Matt J" wrote in message <khbsaf$6jf$1@newscl01ah.mathworks.com>...
> "Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <khb5mn$7bg$1@newscl01ah.mathworks.com>...

> One can see that the empirical lambda-tuning rules in the original LM papers should in theory be applicable to true Hessians and not require an eig(H) operation. However, it's still easy to imagine that if the algorithm lands in a non-convex region where H is not positive definite, that you might have to solve
>
> (H+lambda*I)*x=-g
>
> for several lambda before a descent-direction was found.

Yes and usually the lambda is automatically adjusted (my multiplying by a constant factor) until criteria are fulfilled. Simple check as decrease of objective function will warrant (with probability) the lambda is large enough, even if H is not positive.

That having said, I don't know many people who actually are using the true Hessian in non-linear optimization.

Bruno

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us