**You are now following this question**

- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.

# Newton's method for minimisation returns a critical point

42 views (last 30 days)

Show older comments

I am trying to implement the newton's method to find the minima in the Himmelblau function.

The code does work most of the time, but on cases like this where my initial guess is (0.5 , 1) it returns a critical point of the function. I understand this is because the gradient becomes 0 and no new points are generated.

Now my question would be, is this normal with this method? Is there a way of getting around this problem?

Thanks for any help

close all; clear; clc

% Initialisation of variables to use

x0 = [0.5;1];

tol = 1e-4;

maxits = 50;

% Himmelblau function

him = @(x,y) (x.^2 + y - 11).^2 + (x + y.^2 - 7).^2;

% Gradient of the Himmelblau

grad_him = @(x,y) [[4*x.^3 + 4*x.*y - 42*x + 2*y.^2 - 14];[4*y.^3 + 4*x.*y - 26*y + 2*x.^2 - 22]];

% Hessian matrix of the Himmelblau

hessian_him = @(x,y) [[ 12*x.^2 + 4*y - 42 , 4*x + 4*y ];[ 4*x + 4*y , 12*y.^2 + 4*x - 26 ]];

% Call to newton's function and displaying our results accordingly

[r, iters, flag] = newton_min(grad_him,hessian_him,x0,tol,maxits);

fprintf ("<strong>Newton's method</strong>\n\n");

switch (flag)

case 0

fprintf ("There was a convergence on f\n\n");

fprintf("The minima found is: \n");

disp(r);

fprintf("It took %d iterations.\n\n",iters);

case 1

fprintf ("There was a convergence on x\n\n");

fprintf("The minima found is: \n");

disp(r);

fprintf("It took %d iterations.\n\n",iters);

otherwise

fprintf ("There was no convergence\n\n");

end

function [r, iters, flag] = newton_min(dg,ddg,x0,tol,maxits)

x = x0(1); y = x0(2);

r = NaN;

flag = -1;

for iters = 1 : maxits

x_old = [x;y];

x_new = x_old - (ddg(x,y)\dg(x,y));

if norm(dg(x,y)) < tol

flag = 0;

r = x_new;

return;

end

if norm(x_new - x_old) <= (tol + eps*norm(x_new))

flag = 1;

r = x_new;

return;

end

x = x_new(1);

y = x_new(2);

end

end

##### 0 Comments

### Accepted Answer

Matt J
on 10 Nov 2020

Yes, it's normal.

##### 30 Comments

Matt J
on 10 Nov 2020

Edited: Matt J
on 10 Nov 2020

Is there a way of getting around this problem?

You can make the following change to your update step, but some would argue that it is not strictly Newton's method anymore.

H=ddg(x,y);

if any(eig(H)<=0), H=eye(2); end

x_new = x_old - (H\dg(x,y));

Also, you really need to add a line search to ensure global convergence, though I suspect that won't be necessary for this specific function.

Dussan Radonich
on 10 Nov 2020

J. Alex Lee
on 10 Nov 2020

the line search idea would be to damp the step down H by some value less than 1

x_new = x_old - lam*H\dg(x,y)

If that's what Matt J has in mind, it doesn't seem too meaningful to me since [the modified] H was randomly chosen to not follow the gradient at all; in fact it could have been anything that's positive definite, right? So I don't think it's necessarily guaranteed that doing a line search in a random direction will get you a residual value lower than the current iterate.

Dussan Radonich
on 10 Nov 2020

I tried this, with multiple vlaues for lam. And it would never converge

Bruno Luong
on 10 Nov 2020

Sorry I don't understand what is the heck you guys propose.

H=ddg(x,y);

if any(eig(H)<=0), H=eye(2); end

x_new = x_old - (H\dg(x,y));

It stucks at (0.5,1) because dg at this points is zero, when you change H to eye, x_new is still == x_old, so it's still stuck.

Matt J
on 10 Nov 2020

If you are going to use a constant lam, it must be smaller than one over the largest possible Hessian eigenvalue. An adaptive way to choose lam would be to minimize,

L= @(lam) him(x_old - lam*H\dg(x,y))

Matt J
on 10 Nov 2020

Edited: Matt J
on 10 Nov 2020

@Bruno,

The initial point (0.5,1) is not a point of zero gradient. The problem, though, is that unmodified Newton's method will take the algorithm uphill to a nearby inflection point, since the Hessian at (0.5,1) is not positive definite. When I add the modification (with lam=1), I successfully reach one of the local minima.

Dussan Radonich
on 10 Nov 2020

Matt J
on 10 Nov 2020

Edited: Matt J
on 10 Nov 2020

@J Alex Lee

H was randomly chosen to not follow the gradient at all; in fact it could have been anything that's positive definite, right?

It's not exactly random. When H is set to eye(2), it is equivalent to doing a step of steepest descent for that one iteration. But H could indeed be anything positive definite as long as its eigenvalues are uniformly bounded to some strictly positive finite interval..

Dussan Radonich
on 10 Nov 2020

Matt, i tried with everything below 1. The only time it worked was with something around 1e-5.

Well I have to create newton's method with specifically those parameters, so him would never be sent to it. Is there another way?

But just to have more knowledge, after setting the function L. I would have to do a fminbnd of it between 0 and 1?

Matt J
on 10 Nov 2020

Edited: Matt J
on 11 Nov 2020

Matt, i tried with everything below 1. The only time it worked was with something around 1e-5.

It worked fine for me with lam=1, starting from (0.5,1). What failure cases did you observe?

Well I have to create newton's method with specifically those parameters,

Specifically what parameters? Like I said, the modifications we've already made make it such that it's not purely Newton's method anymore.

I would have to do a fminbnd of it between 0 and 1?

In general, it can be any constant interval [0,s] where 0<s<=inf. Obviously if you choose s very small, fminbnd will complete faster, but "Newton's method" convergence can be very slow. Also, note that for this specific objective function, L(lam) is a 1D polynomial, so in theory you can find its exact minimum analytically.

Dussan Radonich
on 10 Nov 2020

It converges to the critical point or to the maxima. if I chose lam to be anything from 0.1 to 0.9 in intervals of 0.1 all the points that converge from the 1000 random ones are maxima. Maybe this is how it's suppose to be?

My function should only receive the gradient, hessia, initial guess, tolerance, and max iterations.

Should't lam be the result of a line search of the function on only one variable?

Something like:

him_x = @(x) him(x,y);

t = fminbnd(him_x,0,s);

Dussan Radonich
on 11 Nov 2020

Matt J
on 11 Nov 2020

Edited: Matt J
on 11 Nov 2020

Should't lam be the result of a line search of the function on only one variable?

No, it should be a search in the direction you plan to step H\dg(x,y). If you plan to step in the direction of x and y alternately (this is called coordinate descent), you would minimize alternatingly along x and y, but then you wouldn't be exploiting the Hessian at all.

Matt J
on 11 Nov 2020

Edited: Matt J
on 11 Nov 2020

if I chose lam to be anything from 0.1 to 0.9 in intervals of 0.1 all the points that converge from the 1000 random ones are maxima. Maybe this is how it's suppose to be?

You may have to increase maxits, depending on your choice of lam. For me, lam=0.4 gives convergence in 61 iterations.

Making the Hessian positive definite ensures that the direction of the step will be downhill. Otherwise, this is not guaranteed.

Dussan Radonich
on 11 Nov 2020

Ok, but the problem I've been having when trying to do the line search is that with the function you sent

L= @(lam) him(x_old - lam*H\dg(x,y))

him should be receiving x and y. And the inside is x_new, so it would be the same to do

L = @(lam) him(x_new(1), x_new(2))

Would this make sense?

Dussan Radonich
on 11 Nov 2020

Matt J
on 11 Nov 2020

The following converged:

% Initialisation of variables to use

x0 = [0.5;1];

tol = 1e-10;

maxits = 50000;

% Himmelblau function

him = @(x,y) (x.^2 + y - 11).^2 + (x + y.^2 - 7).^2;

% Gradient of the Himmelblau

grad_him = @(x,y) [[4*x.^3 + 4*x.*y - 42*x + 2*y.^2 - 14];[4*y.^3 + 4*x.*y - 26*y + 2*x.^2 - 22]];

% Hessian matrix of the Himmelblau

hessian_him = @(x,y) [[ 12*x.^2 + 4*y - 42 , 4*x + 4*y ];[ 4*x + 4*y , 12*y.^2 + 4*x - 26 ]];

% Call to newton's function and displaying our results accordingly

[r, iters, flag] = newton_min(grad_him,hessian_him,x0,tol,maxits);

fprintf ("<strong>Newton's method</strong>\n\n");

**Newton's method**

switch (flag)

case 0

fprintf ("There was a convergence on f\n\n");

fprintf("The minima found is: \n");

disp(r);

fprintf("It took %d iterations.\n\n",iters);

case 1

fprintf ("There was a convergence on x\n\n");

fprintf("The minima found is: \n");

disp(r);

fprintf("It took %d iterations.\n\n",iters);

otherwise

fprintf ("There was no convergence\n\n");

end

There was a convergence on x

The minima found is:

3.0000
2.0000

It took 61 iterations.

function [r, iters, flag] = newton_min(dg,ddg,x0,tol,maxits)

x = x0(1); y = x0(2);

r = NaN;

flag = -1;

for iters = 1 : maxits

x_old = [x;y];

H=ddg(x,y);

if any(eig(H)<=0), H=eye(2); end

x_new = x_old - 0.4*(H\dg(x,y));

if norm(dg(x,y)) < tol

flag = 0;

r = x_new;

return;

end

if norm(x_new - x_old) <= (tol + eps*norm(x_new))

flag = 1;

r = x_new;

return;

end

x = x_new(1);

y = x_new(2);

end

end

Dussan Radonich
on 11 Nov 2020

Oh ok, yes i get convergence with that. Even by taking the step works fine. Thank you

If I were to implement the line search, would that eliminate the need for the identity matrix?

Dussan Radonich
on 11 Nov 2020

Dussan Radonich
on 11 Nov 2020

Matt J
on 11 Nov 2020

Edited: Matt J
on 11 Nov 2020

Yes, if you were to initialize the iterations at any zero-gradient point, including an inflection point, the algorithm would get stuck there. I'm not claiming that wouldn't happen.

That hasn't happened in what we've looked at so far, however, because we have only tested the initial point (0.5,1) which is not a point of zero-gradient:

>> grad_him(0.5,1)

ans =

-30.5000

-41.5000

Bruno Luong
on 11 Nov 2020

It's me that is confuse I thoughh the gradient is 0 at the starting point.

Bruno Luong
on 11 Nov 2020

Edited: Bruno Luong
on 11 Nov 2020

In practice gradient-based minimizer first computes the descend direction, it can cen Newton like direction

dk = -H(xk)*g(k)

H is estimated from BFGS formula or by other mean such as conjugate gradient direction.

then do the line search

x(k+1) = x(k) + rho*(dk)

When computing dk, the minimizer always ensure

dot(dk,g) <= 0

so that the linesearch always goes downhill for infinitesimal step. It will automatically satisfies when Hk is positive, such as BFGS approximation. In some other methods that can deal with nonpositive Hessian, the direction is given by for example minimizing quadratic form on the sphere around the current point xk (trust region), this also ensures the 2nd condition is meet. The line search never goes up as with the Newton direction like in your case.

The line search approximative per iteration is enough, something costly as fminbnd is too much. Usualy a couple of evaluations of cost function along descend the direction, then the minimzer performs polynomial fit to find the minimum. It must satisfied some other criiteria such as Wolf's criteria etc... if not the step can be reduced aor increased in a geometrical manner until the criteria meet.

Even there are only a few evaluations per step, it will eventually converges to small-enough step along the iteration (one iteration == one change of dk), since the gradient converges to 0 (first order optimality). Thus the most important thing for an optimiser is to find the right descend direction and there is not need to do exact line search.

Anyway all that is well known and one can read through many books of optimization. A classical book is Nosedal that give an overview. In practice solver differs from technical details , and one have to find the corresponding paper to find out what's going on.

Dussan Radonich
on 11 Nov 2020

But that is not the point where the gradient would be zero, it is the critical point (-0.1280, -1.9537).

>> grad_him(-0.1280,-1.9537)

ans =

0.0018

0.0006

Where the method gets stuck

Bruno Luong
on 11 Nov 2020

Edited: Bruno Luong
on 11 Nov 2020

When you think about it, your original implementation of newton method for solving first-order euler eqt

g(x)=0

it doesn't care about if the point x is (local) minimum, maximum or eventually even a saddle point. As written the code is totally symetric to maximizing or minimizing.

So it converges to about local maximum (-0.1280, -1.9537) if you start at (0.5,1). So it actually does well. Just not what you would expect that returns the local minimum, but the code does not incorporate any specific statement on finding a minimum.

Suppose now that you reverse the sign of your objective function him (therefore also H and g accordingly), it will goes exactly the same path and converges to now to the very same solution, but now it's a local minimum!

Matt J
on 11 Nov 2020

But that is not the point where the gradient would be zero, it is the critical point (-0.1280, -1.9537).

Yes, but as long as the algorithm goes downhill from (0.5,1) at every iteration, it can never approach the inflection point (-0.1280, -1.9537). The inflection point lies uphill from your initial point:

>> him(0.5,1)

ans =

125.3125

>> him(-0.1280, -1.9537)

ans =

178.3372

### More Answers (2)

J. Alex Lee
on 10 Nov 2020

Yes this looks normal, you are only asking to zero the gradient of the function, so naturally that includes non-optimal points where the gradient is [vector] zero.

You can use a non-gradient minimizer, like fminsearch to seek local minima

##### 1 Comment

Dussan Radonich
on 10 Nov 2020

Bruno Luong
on 10 Nov 2020

Edited: Bruno Luong
on 10 Nov 2020

"Now my question would be, is this normal with this method?"

Your code juts shows it: yes it is normal.

Now in practice it is very rare that one falls on a stationary point that is not a local minimum. As soon as you work with a non-academic objective function. You won't ever get the gradient == 0 exactly.

"Is there a way of getting around this problem?"

All the book I read about optmization, no one care about this specific problem,since as I said it only happens in academic example. However, many methods will compute for each iteration an approximation of the Hessian, and the positiveness of the Hessian is either enforced or monitored. The Hessian that has negative eigenvalues like yours at (0.5,1) will has automatically a special treatment to escape from non-mimum.

##### 1 Comment

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!**An Error Occurred**

Unable to complete the action because of changes made to the page. Reload the page to see its updated state.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)

### Asia Pacific

- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)